AD-759  712 


COMPUTER  IDENTIFICATION  OF  TEXTURED 
VISUAL  SCENES 

Ruzena  Bajcsy 

Stanford  University 


Prepared  for: 

Advanced  Research  Projects  Agency 


October  1972 


DISTRIBUTED  BY: 


National  Technical  Information  Servico 
U.  S.  DEPARTMENT  OF  COMMERCE 

5285  Port  Royal  Road,  Springfield  Va.  22151 


DISCLAIMER  NOTICE 


THIS  DOCUMENT  IS  THE  BEST 
QUALITY  AVAILABLE. 

COPY  FURNISHED  CONTAINED 
A  SIGNIFICANT  NUMBER  OF 
PAGES  WHICH  DO  NOT 
REPRODUCE  LEGIBLY. 


AD  759712 


STANFORD  ARTIFICIAL  INTELLIGENCE  LABORATORY 
MEMO  AIM-180 

STAN-CS-72-321 


COMPUTER  IDENTIFICATION  OF  TEXTURED 
VISUAL  SCENES 


BY 

RUZENA  BAJCSY 


} 

WV ::  « 

I  i 

ULiiilEi' 


c 


SUPPORTED  BY 

ADVANCED  RESEARCH  PROJECTS  AGENCY 
ARPA  ORDER  NO.  457 


UUIUotK  1972 

foproductd  by 

national  technical 
information  service 

USDsX«:rj^rc* 


COMPUTER  SCIENCE  DEPARTMENT 
School  of  Humanities  and  Sciences 


STANFORD  UNIVEPSITY 


Unclassified 

Security  Classification 


DOCUMENT  CONTROL  DATA  -  R  &  D 

_ (Securjfy  clatsificttlon  of  title,  body  ol  *b*tract  «nd  Indexing  •nnoftion  muwf  be  •nffd  when  the  overall  report  /«  ciastlflcd) 

I  OHIGINATINC  A  C  T  I  V  I  T  V  (CnfpOff  •itthof)  2«.  REPORT  SECURITY  CLASSIFICATION 

Stanford  University  _ Unclassified 

Computer  Science  Department  2b-  CJ,OUP 

Stanford,  California  9430$ _ 

3  REPORT  TITLE 

Computer  Identification  of  Textured  Visual  Scenes 


A  descriptive  NOTES  (Typ*  of  report  mnd  Incluiive  dafa) 

technical,  October  1972 _ 

5  au  THORlSI  (Flrtt  nama,  middfa  Initial,  faat  nama) 

Ruzena  Bajcsy 


6  REPOR  T  O A  TE 

October  1972 _ _ 

CONTRACT  OR  GRANT  NO. 

SD-183 

6.  PROJEC  T  NO. 

ARPA  Order  No.  457 


10.  DISTRIBUTION  STATEMENT 


Distribution  Unlimited. 


?•.  TOTAL  NO.  OF  PAGES  |76.  NO.  OF  REFS 


9*.  ORIGINATOR*!  REPORT  NUMBER(S) 


STAN-CS-72-321 


9b.  OTHER  REPORT  NOISI  (Any  othar  nambara  that  mmy  bm  aaaignad 
thla  report) 


AIM180 


II.  SUPPLEMENTARY  NOTES 


12.  SPONSORING  MILITARY  ACTIVITY 


Details  of  illustration*  in 
this  document  may  be  bette  r 
\  studied  on  microfiche. 

13  ABSTRACT  '  - 

Thi^  work  deals  with  computer  analysis  of  textured  outdoor  scenes  involving  grass, 
trees,  water  and  clouds.  Descriptions  of  texture  are  formalized  from  natural  language 
descriptions;  local  descriptors  are  obtained  from  the  directional  and  nondirectional 
components  of  the  Fourier  transform  power  spectrum.  Analytic  expressions  are  obtained 
for  orientation,  contrast,  size,  spacing,  and  in  periodic  cases,  the  locations  of 
texture  elements.  These  local  descriptors  axe  defined  over  windows  of  various  sizes; 
the  choice  of  sizes  is  made  by  a  simple  higher-level  program. 

The  process  of  region  growing  is  represented  by  a  sheaf -theoretical  model  which  forms 
the  operation  of  pasting  local  structure  (over  a  window)  into  global  structure  (over  a 
region).  Programs  were  implemented  which  form  regions  of  similar  color  and  similar 
texture  with  respect  to  the  local  descriptors. 

An  interpretation  is  made  of  texture  gradient  as  distance  gradient  in  space.  A 
simple  world  model  is  described.  An  interpretation  of  texture  regions  and  texture  grail 
is  made  with  a  simulated  correspondence  with  the  world  model.  We  find  that  a  problem¬ 
solving  approach,  involving  hypothesis-verification,  more  satisfactory  than  an  earlier 
pattern  recognition  effort  (Bajcsy  1970)  and  more  crucial  to  work  with  complex  scenes 
than  in  scenes  of  polyhedra.  Geometric  clues  from  relative  sizes,  texture  gradients, 
and  interposition  are  important  in  interpretation. 


DD 


FORM 

t  NOV 


1473 


(PAGE  ♦) 


S/N  0101. #07- (tOt 


Unclassified.’ 

curl  tv  dl»»»lflc»tlon 


STANFORD  ARTIFICIAL  INTELLIGENCE  LABORATORY  OCTOBER  1972 

MEMO  AIM-180 

COMPUTER  SCIENCE  DEPARTMENT 
REPORT  CS-  321 


COMPUTER  IDENTIFICATION  OF  TEXTURED  VISUAL  SCENES 

by 

Ruzena  Bajcsy 

ABSTRACT:  This  work  deals  with  computer  analysis  of  textured  outdoor 

scenes  involving  grass,  trees,  water  and  clouds.  Descriptions 
cf  texture  are  formalized  from  natural  language  descriptions; 
local  descriptors  are  obtained  from  the  directional  and  non- 
directional  components  of  the  Fourier  transform  power  spectrum. 
Analytic  expressions  are  obtained  for  orientation,  contrast, 
size,  spacing,  and  in  periodic  cases,  the  locations  of  texture 
elements.  These  local  descriptors  are  defined  over  windows  of 
various  sizes;  the  choice  of  sizes  is  made  by  a  simple  higher- 
level  program. 

The  process  of  region  growing  is  represented  by  a  sheaf- 
theoretical  model  which  formalizes  the  operation  of  pasting 
local  structure  (over  a  window)  into  global  structure  (over  a 
region).  Programs  were  implemented  which  form  regions  of 
similar  color  and  similar  texture  with  respect  to  the  local 
descriptors . 

An  interpretation  is  made  of  texture  gradient  as  distance 
grandient  in  space.  A  simple  world  model  is  described.  An 
interpretation  of  texture  regions  and  texture  gradient  is  made 
with  a  simulated  correspondence  with  the  world  model.  We  find 
that  a  problem-solving  approach,  involving  hypothesis-verification, 
more  satisfactory  than  an  earlier  pattern  recognition  effort 
(Bajcsy  1970)  and  more  crucial  to  work  with  complex  scenes  than 
in  scenes  of  polyhedra.  Geometric  clues  from  relative  sizes, 
texture  gradients,  and  interposition  are  important  in 
interpre tation. 


This  research  was  supported  in  part  by  the  Advanced  Research  Projects 
Agency  of  the  Office  of  the  Secretary  of  Defense  under  Contract  No.  SD-I83. 

The  views  and  conclusions  contained  in  this  document  are  those  of  the 
author  and  should  not  be  interpreted  as  necessarily  representing  the 
official  policies,  either  expressed  or  implied,  of  the  Advanced  Research 
Projects  Agency  or  the  U.S.  Government. 

Reproduced  in  the  USA.  Available  from  the  National  Technical  Information 
Service,  Springfield,  Virginia  22151* 


t* 

u 


ACKNOWLEDGEMENTS 


This  study  would  neither  have  been  started  nor  completed  without 
the  advice  and  endless  support  and  encouragement  of  mv  teacher, 
Professor  John  McCarthy. 

I  am  deeply  indebted  to  Dr.  Tom  Binford.  He  taught  me  how  to 
handle  computer  vision  problems  from  the  A. I.  point  of  view  and  helped 
me  to  clarify  many  confusions.  His  unfailing  interest  in  the  progress 
of  my  research  has  played  a  large  part  in  bringing  this  work  to  a 
successful  conclusion. 

Consultations  with  Professor  Robert  W.  Floyd  are  gratefully 
appreciated . 

The  discussions  with  Professor  Jerome  Feldman  and  the  interest 
shown  by  Professor  Cordell  Green  are  also  acknowledged  with  gratitude. 

My  work  was  dependent  on  the  programming  and  system  support  of 
many  people.  Notably  Dan  Swinehart  and  Lynn  Quam  contributed  greatly 
in  several  ways. 

I  wish  to  express  my  thanks  also  to  Professor  Ladislav  Gvozdjak 
from  the  Slovak  Technical  University,  Bratislava,  whose  personal  support 
brought  me  to  Stanford. 

I  am  grateful  to  Dr.  Zoltan  Domotor  who  introduced  me  into  sheaf 
theory. 

Finally,  the  major  financial  support  received  from  the  Artificial 
Intelligence  Project  at  Stanford  University  during  my  academic  program 
at  Stanford  is  also  acknowledged  with  gratitude. 

#•  < 

ill 


THE  TABLE  OF  CONTENTS 


PAGE 

1.  INTRODUCTION  .  1 

1.1  The  Statement  of  the  Problem  .  1 

1.2  The  Outline  of  This  Thesis .  2 

1.3  Previous  Results .  3 

1.4  The  Contribution  of  This  Research .  22 

2.  TEXTURE  DESCRIPTION .  23 

2.1  Examples  of  Outdoor  Scenes .  24 

2.2  Textured  Regions  and  Textured  Elements .  29 

2.3  Textured  Regions  and  Their  Organization .  32 

3.  PROCEDURES  FOR  TEXTURE  DESCRIPTORS .  36 

3.1  Texture  Descriptors  Derived  in  the  Spatial  Domain .  36 

3.2  Texture  Descriptors  Derived  in  the  Fourier  Domain .  39 

3.3  Concrete  Texture  Descriptors:  Local  Descriptors .  63 

4  .  COLORED  AND  TEXTURED  REGIONS .  86 

4.1  An  Algorithm  for  Finding  Regions .  89 

4.2  Texture  Regions .  91 

4.3  Color  Regions . 96 

4.4  A  Sheaf-Theoretic  Point  of  View  of  Finding  Regions .  97 

5.  INTERPRETATION  OF  OUTDOOR  SCENES .  100 

3.1  Pattern  Recognition  Approach .  100 

3.2  Texture  Gradient .  103 

5-3  The  World  Model .  112 

3.4  The  Higher  Level  Program .  ’  1? 

6.  CONCLUSIONS .  129 

APPENDIX:  TOPOLOGICAL  MODEL  REFERENCES .  137 

)\f 


1.  INTRODUCTION 


1 . 1  Statement  of  Problem 

We  intend  to  deal  in  this  thesis  with  techniques  for  computer 
understanding  of  scenes  with  texture.  We  consider  examples  of  outdoor 
scenes,  although  textured  surfaces  appear  in  almost  every  sort  of  scene, 
and  we  show  some  examples  of  isolated  and  artificial  textures.  Studies 
in  computer  vision  are  motivated  by  a  wide  range  of  applications.  Those 
involving  texture  include  agricultural  survey  and  analysis  of  earth 
resources  satellite  pictures.  Planetary  exploration  by  remotely  controlled 
vehicles  will  demand  some  autonomous  vision  because  of  long  delay  times. 

The  social  benefits  of  computer-controlled  cars  have  been  described  by 
McCarthy.  Industrial  robots  will  soon  acquire  vision.  Texture  synthesis, 
for  which  we  feel  our  techniques  are  applicable,  is  useful  in  computer 
aided  design  and  computer  aided  art.  Interpretation  of  scanning  electron 
microscope  pictures  e.g.  for  metallurgy  may  be  of  interest.  We  are  also 
interested  in  constructing  a  model  of  human  perception.  Finally,  vision 
is  one  of  the  more  interesting  problem  areas  within  artificial  intelligence, 
and  contributes  to  the  advance  in  our  understanding  of  intelligent  systems. 

Without  undertaking  a  complete  review  of  the  literature,  we  would 
like  to  broadly  contrast  the  work  we  have  done  with  that  of  other  work  in 
computer  vision.  Several  small  groups  have  atudied  perception  of  polyhedra. 
Their  work  has  been  concerned  with  three-dimensional  objects  with  plane, 
uniform  faces.  The  limited  success  of  these  efforts  has  depended  to  a 
large  extent  upon  large  homogeneous  areas  and  isolated  edges.  A  number  of 
prediction-verification  techniques  have  arisen,  some  of  which  are  special 
to  the  simple  cases  conaidered  there.  Others  are  more  general  and  useful 
to  our  work.  Because  of  the  complexity  of  textured  scenes,  we  feel  that 


1 


the  prediction-verification  approach  to  perceptual  systems  is  even  more 
important  for  our  work. 

Some  work  has  been  done  with  image  processing,  which  is  intended 
for  improving  the  ease  of  human  interpretation  of  images  of  particular 
value.  [Quam,  others].  Other  work  has  been  directed  toward  crop 
identification  and  other  statistical  summaries  of  the  earth  surface. 

These  studies  have  also  made  limited  progress  and  are  mentioned  in  a 
survey  helow.  However,  there  is  much  room  for  ■'.iiprovement  of  texture 
description,  and  those  studies  completely  ignore  scenes  where  the  three- 
dimensional  character  is  important. 

It  should  be  clear  what  we  are  really  after  in  an  interpretation  of 
a  scene.  The  goal  is  not  only  to  get  a  map  of  colored  and  textured 
regions.  We  are  not  merely  after  identification  of  some  image  as  a 
member  of  a  class,  That  is,  we  are  not  out  to  identify  the  letter  A. 

Nor  do  we  wish  to  identify  some  region  of  the  image  as  some  previously 
seen  element,  although  this  might  help  us  to  achieve  our  goal.  We  have 
in  mind  a  system  with  a  task,  to  navigate,  for  example,  and  execution  of 
the  task  requires  understanding  of  the  structure  of  the  space  portrayed 
by  the  image . 

1 .2  Outline  of  Thesis 

In  Chapter  1,  after  the  statement  of  our  problem,  we  present 

a  review  of  literature  that  we  think  is  relevant  to  an  analysis 

of  visual  texture.  The  literature  covered  in  this  review  comes  from  three 

different  sources:  psychology,  neurophysiology  and  computer  science. 

By  no  means  is  this  review  exhaustive.  However,  we  hope  to  show  the 
reader,  through  the  psychological  and  neurophysiological  review,  which 


features  in  grouping  are  important,  and  thus  justify  the  features  that 
we  use  for  texture  description.  The  computer  science  review  includes 
pattern  recognition,  linguistic  and  analytic  approaches. 

In  the  second  chapter,  instead  of  presenting  some  formal 
definition  of  a  texture,  which  we  do  not  believe  is  possible  in  general, 
we  describe  two  concrete  scenes  with  textured  and  colored  regions.  With 
these  examples,  we  describe  our  representation  of  texture  and  of  a  real 
scene . 

The  third  chapter  presents  the  implementation  of  procedures 
which  give  us  texture  descriptors.  We  discuss  operators  in  the  spatial 
domain,  that  is  edge  and  region  operators.  We  discuss  some  of  the 
techniques  possible  and  problems  to  be  encountered  in  extending  these 
techniques  to  textured  scenes.  Then  we  discuss  texture  descriptors 
derived  in  the  Fourier  domain.  Directionality  turns  out  to  be  one  of  the. 
most  useful  features,  easily  detectable  in  the  Fourier  domain.  We  find 
that  the  Fourier  technique  has  many  problems,  and  we  analyze  the  advantages 
and  limitations  of  these  descriptors.  We  show  how  to  compute  the  size  of 
texture  elements  and  their  contrast;  these  analytic  expressions  are 
evaluated  for  several  examples  and  appear  quite  useful. 

Chapter  four  describes  a  region  growing  algorithm  applied  to 
forming  textured  regions  and  colored  regions.  We  present  a  shea f-theore tic 
point  of  view  which  provides  precise  specification  of  conditions  for 
continuity  of  textured  or  colored  structures. 

Chapter  five  is  devoted  to  the  problem  of  interpretation  of  out¬ 
door  scenes.  We  describe  our  earlier  work  using  a  pattern  recognition 
approach  to  the  classification  of  texture  samples.  Then  we 


5 


present  an  analysis  of  the  use  of  texture  gradient:  in  determining  the 
orientation  of  surfaces  and  relative  depth.  A  simple  world  model  is 
presented  for  outdoor  scenes.  A  discussion  is  made  of  higher  level 
procedures  which  make  interpretation  of  two  examples  of  outdoor  scenes. 
This  higher  level  program  has  not  been  implemented,  but  gives  a  good 
persepective  to  evaluate  the  modules  developed,  and  is  the  target  for 


which  we  aimed. 


* 


3 


* 


9 


9 


9 


% 


9 


*» 

XJ- 


r 


1*3  Previous  Results 

During  the  past  five  years  or  so,  a  great  deal  of  work  has  been 
done  in  the  area  of  computer-based  visual  perception,  computer 
recognition,  and  computer  identification  of  visual  scenes  and  patterns. 
Several  results  have  been  obtained  in  computer  analysis  of  two- 
dimensional  images,  interpreted  as  projections  of  geometrically  simple 
real  objects.  See  particularly,  Guzman  (1968),  Brice  and  Fennema  (1970), 
Pingle  (1969),  and  Falk  (1970). 

Polyhedra  and  collections  of  polyhedra  are  recognized  from  single 

view  projections.  First  the  meaningful  edges  are  recognized,  then  the 

main  regions  and  from  these,  finally,  in  combination  with  the  world- 
* 

model,  the  identification  of  the  objects  is  inferred. 

Much  less  attention  has  been  paid  to  computer  analysis  of  two- 
dimensional  pictures  which  depict  real-world  scenes.  What  we  mean  here 
are  scenes  such  as  forest,  grass,  water,  and  their  combinations.  The 
separate  regions,  formed  by,  say,  grass  and  bushes,  do  not  differ  in 
contrast  of  light  intensity,  nor  in  color  (as  both  are  usually  green), 
but  rather  in  their  texture . 

A  primary  problem  in  texture  is  how  we  perceive  a  textured  surface 
as  uniform  in  a  nontrivial  way.  Intuitively  speaking,  there  are  many 
levels  on  which  one  can  perceive  texture.  In  one  situation  we  may  look 
at  the  pattern  showing  how  bricks  are  distributed  on  the  wall  and  call 
that  a  texture.  In  another  situation  we  may  have  a  closer  look  at  the 
same  wall  from  the  same  distance  and  see  the  texture  of  the  individual 
bricks  and  ignore  the  texture  given  by  the  architectural  structure  of 
bricks . 

Flock  (1965)  and  Freeman  (1970),  reporting  about  various 


5 


experiments  in  connection  with  testing  aspects  of  visual  perception 
pattern  into  components  each  of  which  is  in  some  way  internally 
uniform  (Wertheimer  (1912)). 


•3’1  iSgjagfg  St"dlCS  fiMUMieJteactl o^in 

In  what  follows  we  shall  present  a  review  of  certain  experimental 
results  dealing  with  visual  feature  detection  in  animals.  The  general  scheme 
of  experimentation  is  as  follows.  The  set  of  stimuli  consists  of  geometric 
entities  such  as  slits,  edges,  bars,  and  corners.  Recordings  are  made  from 
single  cells  or  a  small  number  of  cells  in  a  sequence  along  the  direction 
of  insertion  of  an  electrode  in  the  visual  system  of  animals  (mostly  cats 
and  monkeys).  The  conclusion  is  that  there  are  special  neurophysiologic 
units,  identifiable  in  well-defined  parts  of  the  brain,  capable  of  detecting 
motion,  orientations,  and  other  features  of  the  visual  stimuli. 

For  instance,  Kuffler  (1953),  placing  microelectrodes  near  retinal 
ganglion  cells  of  cats,  found  that  certain  areas  of  the  retina,  when  stimulate, 
by  spots  of  light,  caused  the  ganglion  cells  to  fire,  while  other  areas 
inhibited  firing.  The  shape  of  the  excitatory  areas  for  retinal  ganglion 
cells  was  a  small  disk,  surrounded  by  an  inhibitory  annulus  or  vice  versa. 

The  retinal  areas,  exhibiting  the  firing,  are  known  as  receptive  field.. 

Concentric  receptive  fields  have  been  found  also  in  the  optic  nerve 
and  in  the  LGN  (Lateral  Geniculate  Nucleus)  of  cats  and  monkeys  (Hubei  (i960) 
and  Wiesel  and  Hubei  (i960)).  The  only  difference  between  retinal  ganglion 
cells  and  LGN  cells  is  that  the  receptive  fields  in  the  LGN  are  smaller. 

Also,  the  receptive  fields  in  the  LGN  of  monkeys  are  smaller  than  those  in  cats 
The  concentric  receptive  fields  have  a  characteristic  temporal  behavior:  If 


6 


the  center  of  the  field  fires  for  "on"  responses,  then  the  annulus  fires 
for  "off"  responses  or  vice  versa. 

Only  spatial  changes  evoke  responses,  while  homogeneous  illumination, 
however  strong,  influences  very  little  the  firing  of  these  units.  In 
functional  terms  these  are  "discontinuity  detectors".  Of  course,  there 
are  "Ganzfeld  detectors"  in  the  retina,  responsive  to  average  brightness 
of  a  large  region,  which  regulate  the  pupillary  mechanism  through  the 
superior  colliculus,  but  we  are  only  interested  in  neural  units  that 

participate  in  processing  patterned  stimuli. 

A  revolutionary  discovery  was  the  description  of  the  relation  be¬ 
tween  receptive  field  geometry  and  the  cytoarchitecture  of  the  cortex. 
Mountcastle  (1957)  discovered  the  columnar  organization  of  the  cat  s 
somatosensory  cortex.  This  vertical  modular  arrangement  in  the  somesthetic 
cortex  means  that  units  along  a  column  perpendicular  to  the  cortical 
surface  all  give  rise  to  the  same  sensory  discharge.  In  the  monkey, 
cells  along  one  column  respond  to  skin  touch-pressure,  and  the 
cells  along  another  column  to  joint  rotation.  (Powell  and 
Mountcastle,  (1959))-  The  interesting  feature  of  this  correlation  between 
cortical  organization  and  functional  organization  became  fully  apparent  in 
the  findings  of  Hubei  and  Wiesel  (i960,  1962)  in  the  visual  cortex  of  the  cat 
They  found  feature  extractors  of  hierarchically  increasing  complexity;. 
However,  as  one  goes  from  the  so-called  simple  units,  having  elongated 
receptive  fields  with  antagonistic  surroundings  -  also  called  slit  or  edge 
detectors  to  complex  and  hypercomplex  units  that  respond  to  highly  special 
features  (like  movement  in  a  certain  direction,  or  the  end  of  a  line),  one 
notices  that  despite  their  diversity,  all  of  these  feature  extractors  have 


a  common  characteristic:  they  all  respond  optimally  to  a  certain 
orientation.  In  a  vertical  module  (column)  perpendicular  to  the  cortical 
surface  of  the  cat  and  the  monkey  (Hubei  and  Wiesel  i960,  1962,  1968) 
there  are  several  types  of  units  from  the  simple  and  complex  or  even 
hypercomplex  kind.  But  in  a  given  column,  all  detectors  have  the  3ame 
preferred  direction.  In  addition  to  this  mapping  of  the  orientation 
information,  the  retinal  position  is  also  maintained  and  units  with 
receptive  fields  in  neighboring  retinal  positions  tend  to  lie  in  close 
proximity. 

Another  remarkable  finding  by  Hubei  and  Wiesel  is  the  hierarchy  of 
feature  extraction.  Each  unit  in  the  hierarchy  results  from  the  outputs 
of  units  of  lower  complexity  using  both  excitatory  and  inhibitory  connections. 
The  simple  units  of  slit  or  edge  detector  type  are  built  from  the  so-called 
Kuf f ler-units  in  the  LGN  by  "sunning"  several  adjacent  Kuffler  units  that 
fall  on  a  line  of  a  given  orientation.  This  summation  results  in  f.  narrow 
elongated  receptive  field  having  elongated  elliptical  excitatory  ( nhibitory) 
area  surrounded  by  an  antagonistic  neighborhood.  Such  cortical  units  fire 
optimally  for  those  line  segments  (slits  or  edge)  that  fall  on  the  proper 
location  on  the  retina  and  have  the  preferred  orientation.  These  simple 
units  are  the  only  ones  (in  addition  to  Kuffler  -  units)  whose  receptive 
fields  can  be  plotted  by  luminous  dots  and  segregated  into  inhibitory  and 
excitatory  areas.  The  complex  and  hypercomplex  units,  on  the  other  hand, 
respond  to  such  complex  features  as  movement  of  an  edge  in  a  certain 
orientation  and  direction  or  the  perpendicularity  of  two  intersecting  line 
segments.  Here  it  seems  that  the  notions  of  straightness,  orientation, 
velocity,  position,  parallelism,  perpendicularity,  abrupt  ending  of  a  line, 


8 


i 


» 


i 


t 


» 


t 


i 


t 


corners,  and  so  on,  appear  as  features. 

The  primary  visual  problem,  how  this  information  is  used  at  later 
stages,  is  still  untouched. 

1 . 3.2  Psychophysical  Experiments  Suggesting  the  Exiatence  of 
Visual  Frequency  Analyzers 

In  this  section  we  give  a  brief  survey  of  experiments  concerning  the 
alleged  existence  of  spatial  frequency  analyzers,  located  in  the  neural 
system  of  human  subjects.  The  set  of  stimuli  consists  of  simple  line  patterns 
with  differing  orientations,  contrast,  and  spatial  frequency. 

The  human  subjects  involved  in  experiments  are  asked  to  respond  to 
the  threshold  contrast  of  the  stimuli.  Certain  aspects  of  response  arc  used 
as  arguments  for  and  against  the  existence  of  a  frequency  analyzer  in  the 
subject.  Enroth-Cugel  and  Robson  (I966)  and  Campbell  and  Robson  (I968) 
claim  to  have  found  neurophysiological  evidence  for  a  spatial  frequency 
analyzer.  Several  experiments  have  been  done  to  determine  the  properties 
(such  as  the  transfer  function)  of  the  hypothetical  analyzer  using  masking 
methods.  For  example,  Pollehn  and  Roehing  (1970)  used  filtered  two-dimensional 
visual  noise  and  spatial  sinusoidal  gratings.  Julesz  and  Stromeycr  (1970) 
used  one -dimensional  filtered  noise  for  masking.  The  noise  consisted  of 
vertical  strips  whose  amplitude  along  the  horizontal  axis  of  a  CRT  monitor 
was  determined  by  a  Gaussian  process.  The  visibility  of  a  sinusoidal 
grating  is  strongly  dependent  on  the  frequency  of  the  grating.  If  the  grating 
frequency  overlapped  the  noise  band,  it  was  masked.  However,  the  rejection 
band  had  to  be  at  least  an  octave  wide  on  either  side  (Blakemore  and  Campbell 
(I969)  and  Julesz  ( 1 97 1 )).  The  frequency  analyzers  have  such  a  shallow 
characteristic  that  the  anal  gy  to  Fourier  analysis  is  rather  remote. 

Historically,  the  first  hint  of  spatial  frequency  analysis  was  made 


9 


o 


by  Campbell  and  Kullkowskl  (1966)  who  investigated  the  visibility 
threshold  of  a  test  grating  as  a  function  of  the  orientation  of  a  masking 
grating.  They  briefly  mentioned  that  maximum  threshold  increase  occurs 
when  the  masking  and  test  gratings  have  similar  geometry.  That  spectral 
analysis  actually  occurs  in  the  visual  system  was  suggested  by  Pantle  and 
Sekuler  1*.  using  adaptation  and  test  grat  ngs  of  different  frequencies 
and  by  Campbell  and  Robson  (1968)  who  noted  that  a  square  grating  appears 
as  a  sinusoidal  grating  until  the  higher  harmonics  reach  their  visibility 
thresholds . 

A  recent  study  by  Nachraias  et  al.  (1969)  showed  that  at  the  threshold 
of  visibility  the  various  spatial  frequency  analyzers  are  statistically 
independent  of  each  other  (a-  long  as  the  various  spectral  components 
have  frequency  ratios  in  excess  of  5 :  U ) .  The  finding  by  Kachalas  et  al. 

1969)  indiertes  that  at  threshold  the  phase  of  visual  or  auditory  signals 
is  not  detected  by  the  perceptual  system.  However,  for  perception  above 
the  threshold  level,  the  phase  Inforsui'.ion  is  used  in  higher  processors. 
After  nil,  both  the  impulse  function  and  white  noise  have  the  same  flat 
amplitude  spectrum,  but  very  different  phase  spectra.  The  fact  that  they 
are  heard  and  seen  as  being  very  different  shows  that  ultimately  the  phase 
information  is  utilized. 

1 .  3*3  Psychological  Studies  in  Pattern  Crounlne 

The  topic  of  this  section  is  a  discussion  of  the  psychological 
literature  on  texture  grouping.  The  grouping  process  depends  Heavily 
on  criteria  of  similarity  of  items.  Although  it  has  beer,  known  for 
some  time  that  similarity  Is  one  of  the  most  important  features  of 
perceptual  grouping,  only  recently,  in  the  work  of  Julesz  (I97I), 


0 


o 


G 


f 


C 


c 


o 


o 


o 


10 


Beck  1967),  and  Attneave  and  Ol.on  (1770),  has  it  been  made  clear 
explicitly  what  kinds  of  similarities  are  effective  In  this  respect. 

Beck  1967)  has  studied  perceptual  groupings  produced  by 
line  figures.  He  showed  that  the  overall  orientation  was  essential 
for  cluster  formation,  while  more  complex  properties  such  as  rated 
similarity  or  familiarity  of  figures  were  Irrelevant.  Example:  T  and 
tilted  T  are  more  similar  than  T  and  .  However ,  as  a  texture , 

T  and  tilted  T  form  a  more  distinguishable  texture  than  T  and  -j  . 

This  has  been  confirmed  by  Attneave  and  Olson  (1970)  who  have  done 

✓ 

slmll.tr  and  more  extensive  study,  with  different  shapes  such  as  L, 

J  ,  A,  V,  lines  of  different  lengths,  and  orientations.  Directionality 
was  Important  In  grouping.  We  might  expect  curvature  to  be  Important 

also,  but  curved  lines  were  grouped  with  straight  lines  which  had  the 
same  direction. 

Crouping  was  dependent  also  on  orientation  of  the  whole  Imsge. 

In  sen  cal,  grouping  and  complementary  segregation  is  based  on  certain 
descriptors,  some  of  which  represent  relationship*  of  elements  of  the 
stimulus  array  to  an  Internal  Cartesian  reference  system. 

Juless  (1£S2)  has  studied  the  clustering  problem  on  random  dot 
textures  stereograms  1.  He  described  textures  and  predicted  their 

Properties  by  specifying  their  higher  order  statistics. 

The  usual  Joint  probability  distribution  Is  an  InadequA.a 
descriptor  In  perception,  since  It  dees  not  describe  the  shape  of 
clusters.  There  are  at  least  two  ways  to  handle  this  difficulty, 
one  way  is  to  define  certain  Information  rules  for  single  clusters  and 


11 


parametrise  them  (orientation,  compactness,  etc.)*  The  trouble  with 
thl*  solut Ion  Is  the  arbitrariness  of  the  selection  of  cluster 
parameters.)  The  second  way  Is  to  use  constructs  of  random  geometry. 
Novlkoff  196.?)  was  the  first  to  surest  such  a  solution. 

The  clustering  process  Is  dependent  on  the  similarity  and 
the  proximity  of  elements.  The  similarity  relation  Is  reluivlsed 
to  brightness,  color,  geometrical  descriptors  and  other  parameters. 

The  proximity  relation  is  based  on  a  distance  measure.  Nonmetric 
mult  I -dimensional  scaling  techniques  (Shepard  (l*>x.») .  Kruskal  (l^CA)) 
ami  hierarchial  cluster-seeking  algorithms  are  useful  tools  for  handling 
similarity  problems.  The  method*  proposed  by  Shepard  and  Kruskal, 
however,  are  appropriate  only  for  linear  or  multilinear  cases.  In  a 
nonlinear  situation  an  Iterative  algorithm  Is  applied  on  small  local 
regions  in  order  to  find  an  Intrinsic  dimensionality  (Shepard  and  Carroll 
1*0,  Rennet  I  *69 Fukunaga  and  Olsen  (1^!)).  Applying 
mult  I -dimensional  scaling  to  discrimination  of  textures  composed  of 
random  x  arrays,  Julcse  l?/!)  found  that  the  most  important  factors 
for  texture  discrimination  were  jp-ith tne»,  and  orlmn.a.im,. 


We  have  «ecn  several  aspects  of  vlanal  textures,  mostly  from  the 
point  of  view  of  psychology  and  psychophysiology .  Now,  wo  shall  examine 
the  characteristics  and  parameters  of  a  texture  with  respect  to  seme  of 
the  approaches  that  have  been  used  in  machine  recognition  of  textures. 

The  best  review  paper  about  the  current  state  of  texture  extraction 
technology  is  that  of  Hawkins  (1970).  According  to  him,  there  are  four 
types  of  approaches  that  have  been  taken  to  texture  classification: 

(1)  Spatial  frequency  content,  (2)  Cray  level  content,  (j)  Local  shape 
content,  and  (l»)  Higher  order  measures.  Host  of  the  eArly  machine 
texture  recognition  was  related  to  analysis  of  aerial  photographs.  As 
an  example  the  work  of  Landarls  (IS70)  can  be  mentioned.  Lendarls 
analysed  pictures  of  aerial  photographs  of  agricultural  landscapes  as 
well  as  of  urban  areas.  For  recognition  purposes  he  used  the  power 
spectrtm  of  the  brightness  function  over  some  windows  of  constant  si*e. 

The  power  spectrum  la  analysed  in  trie  following  way:  first,  two  functions 
are  formed;  one  the  energies  along  different  directions;  then  the  energies 
along  different  frequencies.  Then,  from  these  two  functions,  feature 
vectors  are  created.  The  features  are  the  number  of  peaks  and  the 
strength  of  the  peaks  in  both  functions.  These  feature  vectors  are  used 
for  classification  of  the  area.  He  distinguishes  four  classes: 
man*made  areas  (cities) 
agricultural  areas 
a  gi  ile  ro*J 

intersection  of  two  road*. 

Another  way  of  describing  repetitive  patterns  is  to  use  •  owe 
statistical  features  of  the  brightness  function  forming  the  pattern. 


This  lina  been  used  with  toae  Advantage  in  analysing  biological  material 
{Lipkln  et  nl.  1066,  Prewitt  and  Hendelson  (»9C3)),  cloud  oattern 
class!  ( leal  ion  (Darling  and  Joseph  (l£if'))*  and  the  discrimination  of 
strategic  and  tactical  target*  and  terrain  c  la»si  ( leal  Ion.  The  statistical 
(eat  »»re*,  though  *om*l  imos  useful,  have  some  libit*.  Thu*  the  variance 
oi  a  salt  and  pepper  scene  I*  tin*  saw  a*  that  of  a  white  *cene  with  a 
uni  Iona  dark  area.  The  sice  of  connected  area*  it*  Ink  of  clouds,  for 
instance)  eta  take  a  wide  Spectrum.  The  mother  of  change*  (eero  crossings) 
is  informative  again  only  within  a  certain  context,  when  combined  with 
other  features  fitch  a •  direction,  etc.  Histograms  arc  useful  in 
estimating  light  distribution  in  the  plcl  jre  and  setting  up  the  threshold 
values  lor  measurements . 

Shape  measures  used  In  texture  analysis  have  involved  applying  a 
particular  local  ‘Wtched  filter"  to  every  point  In  the  Image  area,  and 
counting  the  number  of  points  that  match  above  seme  threshold.  This  has 
been  applied  to  the  previously  noted  examples  of  classifying  biological 
material  (Prewitt  and  Hemic  Ison  (1*3)),  to  cloud  class!  f  (cation,  and 
to  targets  and  terrain  classifications  Hawkins  0/70)).  A  more  analytic 
approach  to  shape  description  of  chromosome*  is  taken  In  terms  of  conic 
section*.  An  individual  chromosome  is  defined  as  a  non-negative  function 
on  the  real  plane,  s«*lij-ci  to  certain  constraints  on  position,  site, 
orientation,  etc.  Udicy  *l  al.  (I  /*$)  suggested  a  simple  method  of 
measuring  concavity  and  convexity.  Integral  geometry  measures  Jules* 
(|r»f|))  and  their  extensions  amount  to  calculating,  the  number  of  occunences 
of  n-tupics  of  specially  arranged  local  point*  in  all  orientations  over 
the  Image  area. 


* 


£ 


A 


£ 


£ 


£ 


£ 


£ 


t 


T. 


Matched  filters  allow  one  to  describe  practically  any  shape, 
however,  the  matching  process,  due  to  the  computation  of  a  large 
number  of  correlations  and  the  need  of  hundreds  of  patterns,  is  rather 
slow.  The  similarity  relation  can  be  defined  In  a  straightforward 
fashion  In  terms  of  the  threshold  values. 

Simple  descriptors  such  as  convey. ty,  length  of  the  boundary/area, 
etc.,  require  s-a  1 1  computation  tire,  but  similarity  relations  based  on 
these  simple  descriptors  are  not  usually  sufficient  for  sharp  decisions. 
Another  set  of  simple  descriptors  has  been  suggested  and  implemented  by 
Kosenfeld  and  Thurston  (1971)*  They  use,  in  parallel,  several  local 
averaging  operators  applied  in  different  directions  and  on  various  sizes 
of  windows.  All  results  obtained  from  these  local  operators  are  evaluated 
and  eventually  a  texture  boundary  is  found.  Though  this  method  finds 
some  texture  boundaries,  the  operators  are  too  trivial  tor  handling 
a  wide  class  of  real  textures,  besides,  they  do  not  provide  any  description 
of  a  texture,  they  only  detect  the  texture  differences. 

All  the  approaches  discussed  above  are  pattern  classification 
techniques.  These  techniques  are  not  satisfactory  for  a  description  of 
real  textures  for  the  following  reasons: 

(1)  Pattern  c lassl f lcatlon  techniques  have  concentrated  on  linear 
decision  procedures,  and  domain  independent  formulations.  Context  appears 
as  a  set  of  nu*.bcrical  coefficients  in  a  linear  function,  and  in  the 
choice  of  features.  We  have  better  models  in  terms  of  context  dependent 
decision  trees  which  provide  a  better  basis  for  generalisation  and 
learning. 

(2)  Structural  relationships  and  segmentation  are  part  of  the 


15 


desired  analysis.  We  discuss  this  further  In  our  analysis.  The  point 
has  b.’un  made  repeatedly  by  picture  linguists. 

PICT IKK  LINGUISTIC  FORMALISM 

In  what  follows  we  shall  review  the  so-called  linguistic  approach; 
"picture  linguists"  take  as  their  principal  aim  to  analyze  discrete 
pictures  such  as  bubble  chamber  photograph-*,  biomedical  pictures  of 
neurons,  blood  cells,  and  chromosomes,  machine-printed,  and  hand-printed 
characters,  fingerprints  and  the  like.  Tht-v  argue  rather  convincingly 
that  such  pictures  cannot  be  Identified  by  means  of  classical  receptor/ 
carogorlzer  devices.  What  one  Is  after  in  this  situation  Is  not  Just  a 
cjjtss.l  f  lea  tlon.  but  rather  an  articulated  (discursive)  description  or 
iltl lea t Ion,  capturing  the  structured  subparts  of  a  picture  and  the 
relations  between  them  (Hiller  and  Shaw  (1968),  Narasimhan  (1970), 

Clowes  1 1  7/0))  . 

One  has  to  assume  tha l  certain  pieces  of  information  have  already 
been  extracted  from  the  picture  by  means  of  nonl ingulsllc  techniques 
(texture  elements  and  their  possible  structuring  Is  known).  We  combine 
this  prior  knowledge  with  the  data  about  the  analyzed  picture  and  then 
"deduce"  its  structural  description.  The  "deduction"  is  accomplished  by 
a  graerwir.  Due  to  the  fact  that  we  cannot  descr.  be  a  picture  in  terms 
°f  S-*-r_i n**  of  subpicturcs,  phrase-structure  grammars  cannot  be  used 
directly.  The  rewriting  rules  must  act  on  more  general  entitles  such 
as  arrays,  drawings,  labeled  graphs  webs),  multigraphs,  etc.  For  example, 
Flrsch  fl  ait!  Daccy  1^7)  designed  a  graenar  for  two-dimensional 
languages,  where  the  generating  rules  act  on  arrays.  Pfaliz 


o 


I) 


o 


t 


6 


* 


s 


t 


» 


It 


and  Roaenfeld  (1969)  used  for  picture  description  the  so-called  web 
grammars  in  which  the  rules  act  on  labeled  directed  graphs.  Simply, 
in  picture  granmars  one  tries  to  replace  the  total  ordering  of  strings 
by  a  partial  ordering  of  graph  structures  so  that  the  parsing  can  still 
work.  -  - 

Ihe  language  of  the  graph  grammar  is  nothing  but  a  collection 
of  graphs  that  can  be  derived  from  initial  graphs  by  iterated  application 
of  the  rewriting  rules. 

For  Instance,  one  can  construct  a  grammar  for  directed  two- 
terminal  series -parallel  networks  or  neural  networks  (Pfaltz  (1970)). 

It  is  believed  that  tie  organization  of  textured  regions  in  scenes  would 
be  another  promising  field  of  application,  particularly,  when  the  number 
of  different  textured  regions  occurring  in  a  scene  is  small  and  when 
their  organization  is  such  that  a  moderate  set  of  rewriting  rules  can 
do  the  Job. 

Methodical  scanning  of  the  picture  with  a  prescribed  system 
of  rules,  which  may  b<*  feasible  when  the  variety  of  possible  texture 
elements  and  their  interconnections  is  small,  becomes  rapidly  uneconomical 
where  many  varieties  of  wanted  textures  may  exist,  embedded  in  a 
background  containing  many  similar  forms  which  do  not  belong  precisely 
to  the  required  category. 

The  intricacy  of  textured  picture  recognition  is  associated 
not  only  with  the  presence  of  an  incredibly  large  number  of  elementary 
texture  elements, but  also  with  the  placement  rules  which  seem  to  have 
extremely  complicated  grammatical  structure. 


17 


To  sum  up,  the  linguistic  method  is  suitable  for  such  classes 
ot  pictures  that  contain  a  small  number  of  primitive  objects.  The 
primitives  have  to  be  found  with  accuracy,  otherwise  the  parsing  process 
will  terminate  in  misrecognition.  The  picture  must  be  recursive 
in  nature,  so  that  a  small  number  of  rewriting  rules  can  be  used. 

Picture  languages  are  inappropriate  in  situations  where  the 
number  of  primitives  is  large  and  the  geometrical  relationships 
between  these  primitives  are  random.  This  is  the  case  of  most  ol  the 
scenes  such  as  fabrics,  aerial  photographs  (large  number  of  primitives), 
cloud  covers,  grass,  bushes  (random  relationships),  shading  of  smooth 
objects,  textured  surface  of  three  dimensional  objects  (continuity), 
and  similar  natural  or  artificial  scenes  with  strong  aspects  of 
repetitiveness,  continuity,  and  regioning,  and  with  intricate  changes 
in  gray  levels  and  colors.  On  the  other  hand;  descriptions  of  modes 
and  scene  elements  are  graphs,  and  there  is  a  broad  analogy  to  picture 
language  in  other  approaches. 

ANALYTIC  FUNCTION  APPROACH 

A  (real  two-dimensional  discrete  rectangular)  picture  is 
represented  by  a  pair  <  Ij  x  lg,  p  >,  where  ^  and  l  are 
non  empty  finite  intervals  of  integers  and  p  is  an  arbitrary  real-valued 
function  p:  ^  x  Ig  -♦  Reals.  If  X  «  Ij  x  IQ  is  fixed,  one  can 
identify  the  picture  with  p. 

The  definition  itself  is  empty.  We  may  proceed  to  try  to 
approximate  the  picture  function  by  analytic  functions  defined  on 
subsets  of  the  image  plane.  About  the  only  useful  analytic  properties 


o 


0 


o 


© 


© 


9 


9 


9 


are  those  based  on  periodicity.  These  constraints  are  inspired  by 
pictures  in  which  the  regions  are  generated  from  texture  elements  by 
a  more  or  less  straightforward  family  of  analytic  rules. 

A  sample  of  special  cases  of  deterministically  textured  pictures 
is  listed  below: 

(a)  If  p  is  spatially  periodic  on  a  connected  region 
Sex,  i.e., 
p(x  +  v,y)  -  p(x,y) 
p(x,  y  +  w)  =  p(x,y), 

where  x,y  €  S  and  <y,w>  is  the  spatial  period. 

and  if  p|s  cannot  be  extended  to  a  larger  connected  region  S7  (S  c  S7) 
without  losing  the  periodicity  of  p | S ' ,  then  <3,p>  is  called  a 
periodically  textured  region. 

A  picture  decomposable  into  a  family  of  periodically  textured 
regions  £ R± 1 1  <  i  <  k}  with  spatial  periods  <v±  ,wt>  (X  -  and 

Ri^Rj  =  <")’  w^en  i  /  j)»  is  called  a  periodically  textured  picture. 

Simple  visual  patterns,  such  as  a  rectangle  covered  by  a  mosaic 
of  squares,  triangles,  circles,  etc.,  are  examples  of  periodically 
textured  pictures.  Brick  wall,  honey-comb  herring  bone  and  many  other 
ornamental  or  mosaic  patterns  also  belong  to  this  class  of  pictures. 

Note  that  in  this  case  only  two  texture  elements  are  involved 


19 


(black  and  white  squares,  etc.)  and  the  whole  picture  is  described  by 
a  finite  group  of  translations  in  two  directions  (direct  product  of  two 
translation  groups  of  integers  modulo  ^  x  y.  Thus  textured  pictures 
of  this  kind  can  be  defined  in  terms  of  their  texture  elements  and  an 
appropriate  finite  group  of  translations.  Only  a  small  degree  of 
complication  arises  when  the  textured  picture  is  decomposable  into 
periodically  textured  regions. 

(b)  If  P  is  ea££ially;  geriodic  (periodic  in  one  of  its  arguments) 
on  a  connected  region  S  C  X,  i.e., 

(i)  p(x  +  v,y )  =  p (x,y )  (Periodic  in  the  first  coordinate) 

(L1 )  p(x>  y  +  «)  =  P(x,y)  (Periodic  in  the  second  coordinate) 
where  x,y  e  S  and  v  (w)  is  the  period,  and  if  p|s 
cannot  be  extended  to  a  larger  connected  region  S7  (S  C  S 7 
without  losing  the  partial  periodicity  of  p|S7,  then 
<S,p>  is  called  a  partially  periodic  textured  region . 

A  picture  decomposable  into  a  family  of  spatially 
periodic  textured  regions  [R.|l  <i<  k)  with  periods 

{ Vi)  or  {V},  is  called  a  partially  periodic  textured 
picture. 

(c)  If  p  is  partially  almost  periodic  on  a  maximal  connected 
region  S,  we  obtain  a  new  class  of  analytically  characterized  textured 
pictures.  Here  "almost  periodic"  means:  For  any  e  >  0  there  exists  a 
function  p  :  x  1^  — »  R  of  the  form 

P'(x.y)  =  SjimSym(Ci  je1"11'3!  x  +  bjy)), 
such  that  |p(x,y)-p7(x,y)|<  e, 


20 


& 


Q 


T> 


% 


Z 


n 


where  a.,  a.  £  R,  0<i<n,  and  0<j  <m. 

l  j  —  —  —  j  — 

The  function  p'  is  not  periodic  in  general,  though  it  has  periodic 
components  C2 

|p'(x  +  v,y  +  w)-p/(x,y')|  is  an  arbitrarily  small  number  for  a  suitable 
pair  <v,w>. 

Assuming  that  the  pictures  under  consideration  are  composed  of 
periodic,  partially  periodic,  or  almost  periodic  textured  regions,  we  can 
utilize  features  like  expansion  of  the  picture  function  p  over  a  textured 
region  S  into  periodic  orthogonal  series  such  as  Fourier,  Hadamard- 
Walsh  etc.  Iu  fortunate  cases  the  orthogonal  series -features  compress 
the  information  hidden  in  the  textured  region  into  a  few  dominant 
components.  This  is  followed  by  patteih  classification  (Rosenfeld  (I962), 
Julesz  (1962),  and  Bajcsy  (1970)).  If  the  periodicity  or  repetitiveness 
is  not  the  most  relevant  aspect  of  the  picture  in  question,  an  orthogonal 
expansion  may  scramble  the  information  content  so  that  no  simplification 
occurs . 

A  typical  case  a^oears  when  the  phase  spectrum  happens  to  be 
relevant  in  a  Fourier  expansion  of  a  'honperiodic"  picture  £".d  we  restrict 
ourselves  only  to  the  power  spectrum.  Here  the  information  content  is 
not  only  degraded,  but  also  mixed  in  such  a  way  that  the  Fourier  features 
are  no  more  relevant  (Lendaris  and  Stanley  (1970)). 


;im(a^x  +  b^y)  an,j  tjie  absolute  difference 


f 


21 


1 .4  The  Contribution  of  This  Research 

We  feel  that  theoretical  and  experimental  advances  have  been 
made  in  programs  for  understanding  textured  scenes.  These  are: 

1.  A  sheaf-theoretic  formalism  for  describing  textured  and 
colored  regions. 

2.  Symbolic  structured  description  of  textures. 

3.  Implementation  of  descriptors  in  terms  of  Fourier  descriptors. 
Analytic  expression  of  spacing,  size  and  contrast  of  texture  elements, 
and  their  approximate  location. 

4.  Forming  of  color  regions. 

5-  Forming  of  textured  regions. 

6.  Spatial  interpretation  of  regions  in  terms  of  texture 
gradient. 

7.  Description  of  a  higher  level  procedure  and  world  model  for 


outdoor  scenes. 


TEXTURE  DESCRIPTIONS 


In  this  chapter  we  discuss  qualitative  descriptions  of  visual 
textures  in  order  to  suggest  the  corresponding  implementation  in 
procedures.  Our  aim  will  not  be  detailed  descriptions;  in  a  Borgcso 
story,  a  project  to  make  perfect  maps  lead  to  maps  the  full  size  of  the 
countries  mapped.  Instead,  we  want  to  characterize  textures  in  a  compact 
symbolic  representation  which  suggests  correspondences  with  our  models, 
and  simplifies  human  communication  and  debugging.  We  feel  that  everyday 
texture  descriptions  are  good  models  for  these  purposes.  At  a  low  level, 
we  want  to  work  with  those  descriptions  to  propose  plausible  colored  and 
textured  regions.  At  a  higher  level,  our  aim  is  a  description  in  object 
space,  not  an  image  space  map.  Many  interpretations  and  hypotheses  shoul< 
be  in  terms  of  objects  and  properties  of  the  object  space.  An  example  is 
the  interpretation  of  texture  gradient  in  the  image  as  distnneo  gradient 
in  space.  Another  interpretation  is  that  overlapping  regions  correspond 
to  foreground  and  background. 


23 


{? . 1  Examples  of  Outdoor  Scenes 

In  the  scene  shown  in  Figure  1,  we  find  three  elements:  grass, 
water,  and  rocks.  The  grass  lies  on  an  approximately  level  surface.  The 
rock  Is  In  front  of  the  water  and  behind  the  grass.  We  do  not  describe 
the  image  itself,  but  its  interpretations  as  objects.  In  describing 
this  scene,  we  emphasise  its  segmentation  into  elements  which  re  objects 
and  regions  in  objeci  space.  This  structural  description  characterizes 
the  relationships  among  objects  and  regions.  For  example,  a  tree  stands 
above  the  ground  and  in  front  of  the  sky.  The  structure  allows  us  to 
talk  about  complex  scenes  in  terms  of  simple  elements.  To  move  about, 
we  must  know  where  the  grass  extends,  where  to  walk  around  rocks,  and 
where  fhe  water  is.  These  spatial  relations  arc  essential;  even  if  we 
were  able  to  store  and  recognize  whole  scenes,  we  would  need  a  mechanism 
to  discover  where  we  walk  and  what  we  can  pick  up. 

Crass,  rocks  and  water  correspond  roughly  to  three  regions  in  the 
image.  But  these  simple  elements  arc  not  directly  the  sort  of  regions 
which  come  from  existing  edge  or  region  finding  programs.  The  elements 
we  sec  arc  high  level  abstractions  which  do  not  coincide  with  color  or 
texture  regions.  In  the  first  approximation,  color  is  the  roost  relevant 
feature  that  distinguishes  these  regions.  However,  a  closer  look  at  the 

s 

picture  suggests  that  the  color  boundaries  do  nor  correspond  exactly  to 
the  regions  we  sec.  Consider  the  white  waves  near  the  rocks  or  the  dark 
areas  inside  the  grass  region.  Our  texture  region  growing  also  defines 
a  set  of  regions.  Directionality  is  important  in  the  grass  region,  yet 
that  property  is  not  uniform  over  the  regl.n.  Thus  the  regions  oefined 
by  our  texture  descriptors  do  not  coincide  with  the  grass  region  we  sen. 


24 


I 


f  Yet  there  Is  a  continuity  over  the  region  In  some  of  the  propertied  of 

color,  *  l*e  nnd  density  of  gran*  stalk*.  The  fact  that  *e  have  similar 
*ta!h*  of  gras*  over  the  whole  Held  (sometimes  with  different  direction 
9  or  with  different  color  make*  It  possible  to  propose  the  field  at  an 

elessrnt.  ThI*  complexity  wki  It  Impractical  to  attempt  to  Identify 
local  elements  with  local  prototype*  for  grass,  sky,  or  water,  or  to 
•  attempt  to  identify  the  low  level  region*  fro*  our  program*. 

In  a  second  example  In  figure  2  we  have  ftxjr  element*:  great, 
tree*,  cloud*,  and  sky.  Again,  color  separate*  the  sky,  cloud*,  gra*», 

I  trunk*  of  tree*,  ami  in  *o*he>  area*,  separate*  the  crown*  of  tree*. 

Tenure,  on  the  other  hand,  separate*  the  tree*  from  gra*». 

In  the  object  space  description,  the  sky  and  tree*  are  distinct.  We 
>  could  arbitrarily  define  image  region*  a*  disjoint.  Proximity  oi  region* 

of  like  color  1*  one  Mel*  for  proposing  a  connectivity  among  tree  branches 
am!  among  fragments  of  *ky.  Thoie  connnectlvltle*  reflect  the  object 
»  space  description*  of  tree*  a*  ctnnected  and  sky  a*  connected.  The  region* 

hated  on  proximity  in  the  Image  are  unconnected  and  overlapping.  That 
description  allow*  an  Inference  .which  may  not  always  he  valid)  that  the 
trees  are  In  front  of  the  sky.  Arbitrarily  defining  disjoint  regions 
reject*  these  hypotheses  of  object  space  connectivity  and  the  conclusion 
of  tnterposft ton  from  overlap. 

Although  the  tree*  are  approximately  of  the  same  height,  and  the 
gras#  sulks  are  also  roughly  of  constant  height,  their  apparent  else  In 
the  Image  decreases  toward  the  center  rear  of  the  picture.  The  sire  of 
the  grass  stalks  nearest  us  Is  the  *a»e  as  that  of  the  trees  farthest  fro* 
us.  Clbi<M  (l?>0)  has  emphasised  that  perception  relies  heavily  on  the 

?7 


Interpretation  of  ijfiitMtlc  variation  of  apparent  #l«#  with  la**s 
position  (texture  gradient)  as  a  variation  of  distance  fro*  the  observer, 
for  *0*1  purposes,  the  relative  depth  of  eleaents  In  the  world  le 
•ufflctent.  Assuming  that  we  bnow  the  poeiticn  of  the  observer ,  the 
gradient  allow*  ut  to  determine  the  absolute  dlatance  of  objects.  Tl»e 
•easurenent  of  observer  or  camera  position  and  angles,  and  calibration 
of  the  image  device  (Sobel  (1770)  are  essential. 


0 


a 


o 


a 


c 


e 


* 


s 


The  examples  fro*  the  previous  section  demonstrated  a  structural 
descrlpt Ion  oi  Image*  by  segmrntat Ion  Into  elements  of  object  space. 

We  furtlier  structure  these  textured  regions  In  term*  of  texture  elements 
and  their  spatial  relationships.  In  Table  I  we  show  some  examples  of 
texture  elements  and  their  relationships  as  they  appear  In  object  space 
and  iMge  space. 

a 

Texture  clement*  cannei  be  determined  in  Isolation.  A  single  element 
«My  be  unrelated  to  the  texture.  The  relationships  arc  frequently  orly 
approximate.  In  a  texture  of  pebbles,  the  site  similarity  nay  be 
Important  even  though  the  sices  vary  significantly;  still,  there  Is  a 
uniformity  within  a  factor  of  10  or  so.  Similarities  of  other  properties 
such  as  contrast,  shape  and  spatial  distributions  may  also  be  only 
approximate. 

In  practical  Implementai ions  we  can  describe  only  simple  relation* 
ships:  linear,  periodic,  regular  but  aperiodic,  continuous,  sysesetric, 
and  the  like.  Likewise,  shape  descriptors  suit  be  relatively  simple. 

One  may  question  the  effectiveness  of  simple  relationships  and  their 
descriptors;  It  Is  reasonable  to  think  that  a  more  cobles  description  of 
texture  elements  and  their  relationships  is  necessary  for  adequate 
descrlitlon  of  textures.  The  psychological  experiments  cited  In  Chapter  I 
Indicate  that  human  differentiation  of  textures  depends  heavily  on  a  few 
simple  descriptors  such  as  contrast  and  directionality,  and  ignores  even 
curvature  In  making  texture  groupings.  Although  we  cannot  estimate  the 
computational  complexity  of  descriptors,  we  have  an  Intuitive  feeling 
that  In  terms  of  time,  or  In  terms  of  complexity  of  wiring  for  parallel 
systems,  that  simple  descriptors  such  as  directionality  are  clearly  preferred. 

?9 


f 


IsM*  i 


Xsm  of  Region 

Crass 

Water 

Forest 

Texture 

elements 

leaves,  blades 
of  grass 

water  waves 

(a)  Kyemeen: 

trees 

(b)  Deciduous: 
fruit  trees 

Texture 
t  leaent 

size 

width:  1/U” 
length:  3-10  in. 

widely  variable 

vidth/height:  1/2 
length:  5-20  ft. 

m 

u 

Jt 

to 

i  u 
u 
v 

JO 

Spatial 

relation- 

•hips 

between 

eleaenta 

dense,  roughly 
parallel  and 
vertical,  and 
partial  covering 

quite  parallel 
waves  or  con 
cauric  circular 
waves 

(s)  vertical  and 
parallel 

(b)  vertical  and 
parallel 

partial  covering 

o 

Color 

green,  yellow 
or  brotai 

blue,  dark  blue 
dark  green, 
silver  gray 

a)  crown  of  trees 
is  green  and  the 
trunk  of  trees 
is  dark  brown. 

b)  crown:  green, 
brown*  yellow  or 
red;  trunk: 
light  brown 

Boundaries 
of  eleaenta 

fuzzy,  s aooth 

fuzzy,  sBooth 

sharp,  not  sa»oth 

« 

u 

n 

Ccoaetr ic 
description 
of  eleswnts 

linear  and 
directional 

linear,  direct¬ 
ional,  con¬ 
centric  circles 

trunks  of  trees: 
linear  texture 

crowns  of  trees: 
blob-like  texture 

</> 

t> 

« 

n 

Expected 

contrast 

very  low 

very  low 

high  (trees  with  sky) 

Ipjge  Space _  Object  Space 


Q 


Tabic  1  (Continued^ 


0 

Cl 

• 

Mane  of  Keg f on 

Sky 

Clouds 

Brick  wall 

Pebbles 

o 

V 

s. 

to 

Texture 

elements 

homogeneous 

a  cloud 

bricks 

pebbles 

Texture 
c  lement 
size 

l/h-2  miles 

width:  3-U 
length:  8-15 
inches 

width/length 

1/2 

diameter : 

1-3  inches 

u 

o 

—i 

o 

t 

t 

Spatial 
re latlonships 
between 
c  lementi, 

homogeneous 

pattern 
depends  on 
weather 

horizontal 

rows 

randomly 
d istributed 

Color 

blue 

white,  gray 
red 

gray,  red, 
brown,  yellow 

any  color 

Bounds  rief, 
of  elements 

sharp 

irregular 

(horizon) 

fuzzy  but 
contrasting 

sharp  and 
smooth 

sharp  and 
smooth 

Geometric 
description 
or  elements 

homogeneous 

blob-like  or 
directional 

bidirectional 

blob-1  ike 

t 

• 

u 

<0 

o. 

to 

o 

tc 

2 

H 

Expected 

contrast 

high 

low 

depends  on 
the  back¬ 
ground 
low  or  high 

low  or  high, 
depending  on 
the  back¬ 
ground  . 

I 


I 


31 


T 

i 

i 

Mi  « 

-  ■■ 

iW 

pKc^nMGWtGr  [ 

pa  j 

i  bupm 

\  _  c3*. » a  - 

Lf ,^1-^  -i  nV% 

Bg'fci*  I 

lfcr|/'  ;r~rq 

i  T  ’ f  i 

s  t^7,  j  i 

r  n  I 

■  ■■l1 

[Sfeii 

of  homogeneity,  the  regions  of  a  common  property  correspond  to  the 
regions  of  homogeneity  from  a  region  growing  operation.  The  related 
operation  of  finding  discontinuity  in  texture  properties  is  analogous 
to  edge-finding  between  homogeneous  regions.  Rosenfeld  (1970)  has 
discussed  the  problem  of  finding  texture  boundaries  as  that  of  finding 
gradients  in  the  average  values  of  statistical  measures  (which  are 
assumed  to  be  any  suitable  operator).  While  this  is  suggestive,  it 
unnecessarily  emphasizes  statistical  measures  as  opposed  to  structured 
descriptions  which  would  be  more  suitable  for  patterned  textures. 

Let  us  approach  the  question  of  the  organization  of  textured  regions. 
In  the  simplest  case,  the  picture  to  be  described  is  partitioned  into  a 
disjoint  covering  of  textured  regions. 

A  somewhat  more  complex  system  of  regions  can  be  described  by  a 
tree  structure.  It  may  be  used  to  represent  the  topological  organization 
of  brightness  contours  (Krakauer  (I97O)).  While  this  may  seem  a  great 
generalization,  a  tree  does  not  well  describe  the  system  of  regions 
from  a  number  of  descriptors.  Even  for  a  single  descriptor,  the  tree  is 
rigidly  heirarchical.  The  nodes  of  the  representing  network  are  used 
for  regions  and  the  arrows  correspond  to  the  spatial  relationships  between 
the  regions.  Systems  of  features  lead  to  several  networks  of  regions. 

A  single  feature  may  give  rise  to  a  non-dis joint  network  of  regions. 

For  an  operator  to  give  disjoint  regions  (a  partition)  one  must  assume 
an  equivalence  relation  (reflexive,  symmetric,  and  transitive). 
Quantization  would  be  an  example  leading  to  an  equivalence  relation. 
Gradient  thresholding  would  be  another  example.  Selection  of  typical 
values,  followed  by  thresholding  within  an  interval,  would  not  lead  to 


35 


equivalences,  so  that  it  would  not  lead  to  a  partition. 

It  is  not  necessary  to  fully  expand  the  whole  network  or  family 
of  networks.  Rather,  instead  of  thinking  of  comparing  several  networks 
derived  from  different  features,  we  use  some  simple  hypotheses  derived 
from  a  subnetwork  of  some  particular  network  and  supported  by  evidence 
from  features  which  might  imply  another  network  (which  may  never  exist 
as  such). 

We  must  deal  with  texture  boundaries  as  well  as  textured  regions. 

The  boundary  problem  is  dual  to  the  grouping  problem.  Therefore  the 
difficulties  encountered  in  a  grouping  have  their  analogs  in  boundary 
detection.  Take  as  an  example  the  scene  in  Fig.  1.  The  objects  in  this 
scene  (grass,  water,  and  rocks)  are  separated  by  physical  or  virtual 
boundaries.  Some  of  them  are  visible  while  others  are  hidden  (grass  covers 
the  boundary  between  water  and  rocks).  In  the  identification  process 
it  is  not  clear  whether  one  should  follow  the  boundaries  defined  by 
individual  texture  elements  (look  at  the  individual  straws  near  the  rocks) 
or  whether  one  should  look  for  some  kind  of  average  boundary  or  perhaps 
keep  a  spatial  gap  between  two  different  textures. 

Region  growing  operators  use  certain  similarity  criteria.  These 
are  applied  in  patching  local  structures  into  global  ones.  Whenever  we 
meet  a  dissimilarity,  a  boundary  point  or  segment  is  proposed.  In  the 
first  approximation,  a  region  is  formed  by  patching  continuous  structures 
over  connected  areas.  In  this  case  the  corresponding  boundaries  are  also 
connected.  There  may  also  be  internal,  unclosed  boundaries.  When  local 
discontinuities  occur  within  a  region,  proximity  criteria  are  used  for 
bridging  the  gaps.  The  proximity  here  is  used  as  an  extension  of  continuity. 


o 


The  same  is  true  with  interrupted  boundaries.  Proximity  and  continuity 
of  boundary  segments  suggest  continuation. 

In  the  past  it  has  been  customary  to  think  of  regions  as  a  disjoint 
covering  of  the  image.  The  examples  in  Fig.  1  and  Fig.  2  have  shown  that 
this  conception  is  too  simple  to  be  useful.  An  equally  simplistic  point 
of  view  is  that  boundaries  of  regions  are  always  closed  curves. 


3*  PROCEDURES  FOR  TEXTURE  DESCRIPTORS 


In  the  previous  chapter  we  discussed  the  description  of  texture  in 
object  and  image  space.  In  this  chapter  we  shall  specify  the  implementation 
of  these  descriptions.  Specifically,  we  shall  study  texture  descriptions 
in  the  spatial  domain  and  in  the  Fourier  domain.  Algorithms  for  concrete 
descriptors  will  also  be  presented.  Although  the  descriptors  will  be 
derived  in  the  Fourier  domain  from  the  power  spectrum,  they  actually  refer 
to  textural  properties  in  the  spatial  domain. 

We  will  find  it  useful  to  distinguish  scalar,  topological,  and 
geometric  features  (shape,  area,  size,  boundary ,  connectivity,  thinness 

ratio)  from  relational  features  (spatial  distribution,  organization, 
gradient) . 

3’1  Igxture  Descriptors  Derived  in  the  Spatial  Domain 

Since  descriptors  refer  to  properties  of  objects  represented  in  the 
image  space,  it  is  natural  to  look  for  operators  acting  directly  in  the 
spatial  domain.  The  skeleton  of  this  section  is  this:  Procedures  isolating 
the  image  elements,  geometric  description  of  image  elements,  and  clustering 
of  elements  based  on  proximity  and  their  spatial  organization. 

In  the  process  of  isolating  the  image  elements  the  most  important 
features  are  the  following  topological  properties:  connectivity,  continuity, 
and  proximity.  These  properties,  applied  to  brightness  or  color,  are  used 
in  all  region  finders  (Fenema  and  Brice  (1970)).  Discontinuity  is  the 
basic  property  to  be  used  in  edge  and  line  operators  (Binford  (1970), 

Hueckel  (1971)).  Current  edge  and  line  operators  are  designed  for  de¬ 
tecting  discontinuities  between  two  large  homogeneous  regions  and  they  do 
not  operate  satisfactorily  on  small  regions.  The  textured  elements  that 
one  finds  in  outdoor  scenes  are  too  small  in  size  and  too  large  in  number 

36 


and  therefore  cannot  be  processed  usefully  by  any  of  the  above  operators. 
However,  under  poor  resolution  conditions  in  the  image,  where  the  texture 
elements  are  smeared  (so  that  the  homogeneity  stands  out  more  than  usual), 
one  may  be  successful  even  with  the  above  mentioned  operators. 

After  completing  the  isolation  of  image  elements  -  figures,  we  shall 
describe  them.  We  select  those  descriptors  which  enable  clustering, 
i.e.,  based  on  proximity  those  which  will  find  the  nearby  elements. 

We  had  already  a  chance  to  note  that  color  and  brightness  are  among  the 
most  important  descriptors  in  natural  scene.*.  Image  elements  cannot  be 
taken  separately  from  their  background.  In  fact,  the  common  background 
of  the  elements  is  a  strong  clue  for  their  clustering.  The  relationship 
between  the  background  and  color  is  expressed  in  terms  of  contrast,  and 
therefore  it  can  be  used  as  another  descriptor. 

The  descriptors  corresponding  to  spatial  relations  depend  on 
proximity  relations  just  as  cluster  processes  depend  on  proximity.  Typically, 
we  want  to  define  colored  regions  by  proximity,  rather  than  only 
connectivity.  Grass  and  trees  are  regions  broken  into  many  fragments 
defined  by  connectivity.  But  other  like  legions  are  nearby.  This 
proximity  in  space  and  color  can  be  phrased  as  a  problem  of  proximity  in 
4  dimensions,  using  the  multi -entry  technique  outlined  by  Binford  in  the 
Stanford  Progress  Report  of  January  1971.  Likewise,  super  regions  can  be 
defined  by  brightness,  contrast,  size  and  shape  descriptors  clustered 
on  the  basis  of  proximity.  Spatial  relations,  the  intervals  between 

elements  and  directions  of  these  intervals,  can  be  defined  also  among 
elements  linked  by  proximity. 

As  an  expedient  which  is  suitable  for  linear  textures,  one  can 


project  the  elements  into  several  directions.  Each  projection  will 
tic  t  tiu  1  ly  he  .1  one -dime  its  iona  I  function  of  gray  levels  or  color.  Since 
this  function  is  still  too  complicated  for  practical  implementation, 
it  is  simplified  by  using  a  square  wave  approximation .  The  square 
waves  are  described  cither  by  edge  detection  operators  or  by  magnitude 
and  the  distance  between  two  consecutive  zero  crossings.  Since  the 
distances  between  zero  crossings  are  intervals  in  which  the  approximating 
gray  levels  arc  constant,  the  ticthod  is  called  interval  analysis.  That 
technique  has  been  used  with  some  success  to  describe  regular  linear 
textures  in  an  MIT  term  paper  by  Peter  Wolfe,  (1970). 

Since  the  shape  of  a  two  or  three-dimensional  object  In  a  general 
situation  could  be  extremely  complicated,  we  cannot  hope  and,  in  fact, 
we  do  not  want  to  describe  it  in  detail.  Instead,  complex  shapes  are 
decomposed  into  simpler  ones  which  arc  (hopefully)  easier  to  describe. 

A  typical  example  is  a  tree  which  may  be  decomposed  into  its  trunk  and 
crown,  where  the  trunk  is  geometrically  linear  while  the  crown  is  blob¬ 
like.  In  shape  analysis  of  outdoor  scenes  we  find  directionality  asiong 
the  most  useful  features.  One  can  sec  this  immediately  in  Table  1. 
Directionality,  combined  with  length/width  ratio  and  length  along  the 
preferred  directionality  make  up  a  linear  element  description  of  shapes 
or  parts  of  shapes.  These  are  all  directly  implcmcntablc  descriptors. 

In  ourdoor  scenes,  the  shapes  of  texture  element  are  quite  important, 
while  the  shapes  of  the  important  regions  of  object  space  (sky,  grass, 
trees,  water  are  not  very  important. 

The  apparent  size  of  an  object  in  an  image  is  not  relevant  If 
considered  in  isolation.  This  fact  was  already  noted  in  Fig.  ?.  There 


J8 


the  apparent  nine  of  grata  wat  the  same  at  the  apparent  ti*e  of  t  ces, 

Inca  ted  further  from  the  obaerver.  However,  »h»  also  of  region  could  In? 
relevant,  particularly  in  the  initial  stage  of  a  tcene  analytit  when  one 
it  acarching  for  large  connected  region#.  Dcapite  the  importance  of 
deacriptort  derived  in  the  spatial  doeum,  w#  ahall  not  uae  them  in  this 
work.  Currently  available  edge  finders  and  region  finders  are  tailored 
for  largo  homogeneous  regions.  In  natural  scenes,  textured  areas  arc 
coaposed  of  small  texture  elements.  Even  tc  the  extent  that  the  boundsr.es 
of  small  regions  arc  determined,  the  data  structures  require  unreasonably 
large  memory,  since  the  boundary  descriptions  are  no  longer  economical. 

The  next  steps  of  description  of  elemunts  and  clustering  elements  of 
similar  direction,  sUe,  color,  or  brightnesr,  seem  prohibit ively  tie* 
consuming  and  difficult  for  grass,  pebbles,  sand,  etc.  The  one- 
dlsmnslonal  interval  analysis  might  have  some  utility  but  is  very  limited; 
combined  with  other  methods  such  as  Fourier  description,  interval  analysis 
is  potentially  useful. 

3 .2  Texture  Descriptors  Derived  In  the  rour.lf.£-apgyi!l 

In  what  follows  we  shall  need  some  elementary  and  well-known  notions 

of  Fourier  analysis.  They  will  be  reviewed  presently. 

Consider  a  real  picture  function  of  two  variables  in  a  matrix  form 

g  x,y),  where  x  and  y  are  variables  from  fixed  intervals  of  natural 

numbers  Ij  -  (0,  1 . . . .  The  two-dimensional  discrete  finite 

Fourier  transform  of  the  function  g  x,y)  is  then  given  by 

p-1  P“1 

F(n,m)  *  ^  1  g(*.y)«*pK>f?* (*»♦*■  P)»  1 

p  x*0  y*o 

and  1  is  the  usual  imaginary  unit. 


where  p  *  Pj  *  P 


t 


t 


I 


I 


In  general,  F(n,m)  >*  *  complex  (unction,  given  uniquely  by  it* 
power  spectrum  P(n,m)  and  phase  spectrum  FSl(n,m): 

rCn.m)  *  wr(r^#(n,«i)  ♦  ^|W(n*"))  ♦ 

l*S!{n,«)  *  APCTAK(lfI>|(n,»)/Fr#(ii.«)). 

From  the  elementary  properties  of  the  Kmrlsr  operator  it  follows 
that  «ny  reel  periodic  function  h«s  a  *y***etr!c  Fourier  image  with  respect 
to  the  origin.  An  equally  well-known  but  somevna  :  rtoro  interesting  fAct 
is  that  the  power  spectrum  is  Invariant  with  respect  *o  translation  in 
the  spatial  doe* In,  but  not  with  respect  to  rotation.  A  trivial 
consequent*  of  this  property  is  that  the  directionality  of  a  pattern  in 
the  picture  is  preserved  in  the  p<*tcr  spectrum  but  the  phase  oi  the  trans¬ 
form  is  not. 

If  a  function  is  periodic,  partially  periodic,  or  aloost  periodic, 
then  its  Fourier  transform  compresses  the  data  considerably  without  great 
loss  oi  information  and  the  relational  features  derived  from  the  Fourier 
image  form  a  good  description  of  periodic  or  alnost  periodic  patterns. 

As  we  have  pointed  out  above,  the  power  spectrum  contains  the 
information  about  the  form  of  a  periodic  picture  function  restricted  to 
a  window.  The  phase  spectrum,  on  the  other  hand,  represent*  by  and  large 
the  locational  [positional)  inforsMtion  in  a  window. 

We  said  also  that  directionality  in  preserved  in  the  power  spectrum. 
This  fact  allow?  us  to  Infer  sow  gross  shape  properties.  We  are  able 
to  distinguish  directional  and  non*d irect Iona  I  components  of  texture, 
lor  this  reason,  it  is  csefui  to  irsmlo3  the  power  spectrum  (torn  a 
cartesian  coordinate  system  <n,m>  into  a  polar  coordinate  system 
<r,  ©  >.  Then  In  each  direction  9  ,  one  can  regard  P{r,  9)  as  a 

kO 


a 


one -«l  I  or  ns  I  on*  I  function  P  (r).  Similarly,  for  each  frequency  r, 

k  4  *  O 

fund  Ion  p  9  In  a  one -d  I  mansion*  I  function.  Thu*,  the  description 

of  the  texture  depend*  In  thin  method  on  the  form  of  the  pair  of  function* 

<MO.  M»)>. 

O 

Pune lion  f*r|qo)  determines  whether  there  (•  a  directional  or  non- 
directional  component.  If  f  met  Ion  Pr(*)  !•  Il*l  then  the  correspond  I  ng 

texture  I*  nond I  reel  tonal .  If  It  he*  few  distinguished  peeks,  the  texture 

0 

Is  direct lonel.  One  peek  leads  to  e  oonodlrect tonal  texture.  Two  peek* 

under  certeln  constraint*  lead  to  e  bldlrectlonel  texture. 

The  nondlrectlonel  texture  could  he  homogeneous,  noisy  or  blob*llke. 

function  P  (r)  distinguishes  between  nolsv  end  blob*Itke  texture. 
fP 

TIk*  noisy  texture  corresponds  to  e  fl*t  nonteru  function  P  (r).  Where** 

9 

In  the  cese  of  the  blob-llke  texture,  function  P^(r)  WIH  Have  *«** 

peek*.  The  homogeneous  texture  corresponds  to  *n  elnost  constent  function 

P  fr)  for  r>0  end  with  e  lerge  value  for  P  (0).  In  the  case  of 
9  9 

a  directional  texture  function  P  r/  will  have  peaks  similar  to  the 
case  of  blob-llke  texture.  The  frequency  in  the  maximum  of  P(r  ) 

Tim  * 

will  roughly  correspond  to  the  distance  between  two  parallel  stripes  (In 
the  case  of  directional  texture)  and  to  the  distance  between  two  blobs  In 
the  case  of  a  blob-llke  texture. 

We  have  shown  the  Interpretation  of  function  P%’ .  Sow  we  want 
in  analyse  a  further  possible  Interpretation  of  function  P(r).  Consider 
a  monod tree  Ilona  I  pattern  that  appears  as  a  one*dt*ens  tonal  (In  the 
particular  direction)  square  wave  function  shown  In  Pig.  k  . 


kl 


r is.  k 


Itenot*  the  replicative  symbol  *{«)  and  the  vave  for*  by  f  *).  the 

periodic  lft  riR*  *•  Is  repressed  as  a  convolution  of  f(*)  and 

t  *),  thus 

r>)  *  t(n)  *  *'*). 

The  lour  lor  transform  of  t(n) 

*  r  *))  *7{f{*.v))  •  7(»(k.I)) 

*  sine  ft  •  «{ftl. 
v  I 

Apfljfln*  (hr  >nh.  (onet.'M  ol  (He  uHih  w,  4„«lrt  „  ,  cww,u„00 
In  the  Fourier  donaln. 

7l*f  n.u  .|„|)*  *inc  &  .  rtloc  £  .  ^>1. 

w  v  / 

If? 


Ttii*  function  displayed  graphically  it  seen  in  Fig.  3  . 


n*.  5 


It  it  clear  that  we  can  measure  “  ,-y  in  th«  power  spectrum 

fro*  the  function  P(r),  for  every  directionality  and  window  site  w. 

Consequently  we  can  estimate  (how  well,  depends  on  the  brightness  function) 

the  wavelength  /,  at  before  and,  |fl  addition,  the  aise  of  the 

smallest  element ,  v.  v  and  /  will  be  parameters  associated  with  each 

description.  KwHNplct  of  functions  P[»),  P{r)  of  texture  samples  will 

he  presented  next.  T  e  aise  of  samples  Is  ^  x  points.  The  points 

on  the  y  axis  h*v»-  the  corresponding  values  of  the  functicns  P(?)  and 

P  r  respectively.  The  points  ©  on  the  x  axis)  in  the  graph  for  function 

f  9  represent  the  value  (x-l)  •_£_  ,  for  x  •  l,  2 . 16.  The  point. 

16 

on  the  x  axis)  in  the  graph  for  function  of  P(r)  have  Just  the 
actual  values  of  frequency  r  *  1,  .  16. 


o 


u 


o 


O 


o 


0 


o 


o 


/ 


K«ch  pair  of  fund  Iona  <P(®) ,  p(r)>  will  be  described  by  socae 

parameter*,  Hated  in  a  tabic,  below  is  the  list  of  the  parameters 
and  their  description. 

HAHE:  The  natural  language  names  of  the  texture  samples. 
DESCRIPTOR:  A  hypothetical  description  of  the  sample  according  to 
some  criteria  (thresholds)  applied  on  functions  <P(®),  P(r)>. 

MAX  PftP):  The  max law  1  value  of  P{®). 

®|Mx:  Is  such  o  that  PfoP^^)  ■  »*x  P^). 

WIDTH:  The  distance  between  ®. ,  ®  ,  where  ®,  <  ®  < »  and 

1  1  -  sax  ~  2 

P{®,)  *  MIN  P7®),  the  left  side  with  respect  to  P(®  ).  p(®  1  •  MIN 

*  max  2 ' 

Pf®  )  (the  right  side  with  respect  to  Pif®^). 


DIR:  If  the  descriptor  is  directional,  first  perform  a  fan  filtering 

in  such  a  way  that  the  fan  filter  is  centered  in  ®  and  then  find 

ma  x  ^ 

MAX  P  t.,m  «  P(n  .  M  )  and  thus  compute  DIR  •  arctg  -Eas-  .  If  the 
max  max  n 

max 

descriptor  is  nond i recti one  1  then  just  find 

M\X  P(n,m)  »P(n  ,  m  ) 

'  max  max7 

and  compute  DIR  as  above. 

RO:  Is  the  wavelength  computed  from  the  maximal  point  energy. 

RO  •  window  slae  J  n’  +  m 

f  max  max 

M  :  is  the  m<an  value  of  function  P(®). 
v^:  is  the  variance  of  P{®). 

MAX  P(r):  is  the  maximal  value  of  P(r). 

*max:  1§  ,uch  r  thMt  P*rmax^  *  mx  P(r)‘ 


WIDTH  r:  Is  the  distance  between  the  center  of  P(r)  and  the 
threshold  value  of  the  envelope  of  P(r). 


bit 


0 


Mf:  is  the  mean  value  of  P(r). 

vf:  is  the  variance  of  P(r). 

v:  is  the  element  size,  -  window  size/width  r  of  the  envelope. 

/:  is  the  spacing  between  elements. 

-  window  size/ frequency  of  the  first  peak. 

In  the  case  of  bidirectional  texture  a  pair  of  values  is  listed 
for  the  following  parameters: 

MAX  P(q>),  9max,  width  9,  DIR  and  RO. 

Tie  texture  names  arc  on  the  top  of  each  picture  displaying  the 
corresponding  function  P(cp)  and  P(r).  The  actual  samples  of  texture- 
lines,  wood,  circle,  and  sand  -  arc  in  Figures  14,  15,  20,  and  21. 

The  texture  water  is  a  sample  from  the  upper  left  corner  of  the  picture 
in  Fig.  1. 

Fig.  £  and  Fig.  7*  display  functions  P(ip)  and  P(r)  of  textures, 
parallel  lines  and  water,  followed  by  Tnble  of  Parameters.  For  the 
identification  of  parameter  v  we  have  used  the  directional  part  of  the 
water  picture.  The  filtered  alternative  of  functions  P(r)  and  P({p) 
fo:  water  is  in  Fig.  7b. 


O 

n 

45 


pMMLia.  Lttat 


t 


* 


f 


► 


w 


TABLE  2 


NAME 

LINES 

WATER 

UaoCRIPTION 

MAX  P(cp) 

MOMDiftEOTlOWl 

MoNOMRECTIONAL' 

242 . 12 

13.5 

Tmax 

9 

9 

WIDTH  <p 

4 

6 

DIR 

1.57 

1-57 

RO 

t 

2-9 

16 

M 

<P 

39-8 

5.08 

V 

CD 

14  .22 

0.64 

MAX  P(r) 

105  .2 

6.96 

r 

max 

11 

4 

WIDTH  r 

16 

16 

M 

r 

37-9 

4.86 

V 

r 

7.42 

O.564 

COMMENTS:  Both  textures,  lines  and  water  are  described  by  the  program 
as  monodirectional  (they  have  one  signficant  peak  in  P(®)  form, 

geometrically  speaking,  parallel  vertical  lines).  That  is  why  9  and 

max’ 

DIR  in  both  cases  are  the  same,  e.g.  Tf/2.  The  contrast  in  the  picture 
of  lines  is  much  higher  than  in  the  picture  of  water  as  indicated 
by  the  values  of  P(cd)  and  P(r).  The  regular  pattern  of  lines  shows 
higher  values  of  the  directional  component  vs  the  nondirectional 
component  than  the  texture  of  water.  (Compare  for  instance  the  values 


Of  MAX  PC)  and  ^  ).  The  water  waves  are  broken  and  thus  they  form 
parallel  broken  lines  organized  in  random  fashion.  This  shows  up  in 

the  function  P(r)  of  water  texture.  That  is  rather  flat  in  comparison 
with  P(r)  of  the  texture  of  lines. 

In  Fig.  8  we  display  a  sample  of  grass  from  the  scene  in  Fig.  1. 

The  upper  left  window  in  Fig.  8  is  the  original  sample,  the  upper  right 
window  is  its  corresponding  power  spectrum,  the  lower  left  window  is 
the  power  spectrum  after  a  high  pass  filter  and  the  lower  right  window 
is  the  resynthesized  original  picture  after  the  high  pass  filter. 

This  example  is  presented  in  order  to  demonstrate  the  necessity 
for  separating  the  slow  changes  from  the  real  texture  pattern.  The 
rationale  for  this  is  that  most  of  the  objects  (texture  elements)  tend 
to  have  the  same  reflectivity  and  the  lighting  varies  smoothly,  thus 
shading  in  the  Fourier  domain  generates  a  low  frequency  component. 

Functions  P(«p)  and  P(r)  of  textures  grass ,  wood  and  canvas 
arc  displayed  in  Figs.  «* ,  10a,  and  11a  respectively.  The  analyzed  samples 
from  grass  are  in  Fig.  8,  from  wood  in  Fig.  V) ,  and  on  canvas  in  Fig.  18. 
For  the  sake  of  considering  the  main  directionality  and  thus  to  be  able 
to  determine  2  and  v  we  display  the  filtered  alternatives  in  Fig.  9b 
for  grass,  Fig.  l0b  for  wood,  and  Fig.  lib,  and  11c  for  canvas  (for  one 
directionality).  The  table  of  their  corresponding  parameters  is  below: 


b9 


1 


I&ILU 


KAME 

descriptor 

CRASS 

BIDIRECTIONAL 

WOOD 

HONOOl  SECTIONAL 

CANVAS 

BIDIRECTIONAL 

mx  pf») 

<$•35,  7.5> 

0* 

<1C8,  80> 

^max 

<5.  i3> 

U 

<1,  9> 

WIDTH  e 

d.: ,  k> 

3 

<$* ,  2> 

OIK 

<o.k63.  2*oy> 

2.35 

<1.37,  0> 

no 

<l*».31,  16> 

8.0/ 

<16.  8> 

4-76 

49*3 

Vc 

0.53* 

3.76 

5.46 

KAX  P(r ) 

5.  63 

44.8 

120.3A 

V 

k 

3 

K 

WIDTH  r 

16 

16 

9 

N 

r 

k.y> 

31.46 

47.14 

V 

r 

0.&* 

z.y* 

7.64 

i 

<B,16> 

10 

<16,  8> 

t 

max 

8 

8 

V  for  mx  DIR 

1 

1 

1.8 

COtffJXTS:  First  of  *11.  notice  that  grass  is  described  as  bidirectional, 
contrary  to  whet  would  be  expected.  The  reason  is  that  even  after  high 
pass  filtering,  there  is  still  signficant  slow  change  left 
wavelength  •  16)  which  fonts  the  second  peak.  One  needs  to  know  more 
about  the  scene  its  illumination,  continuity,  context)  in  order  to 
remove  this  kind  of  slow  change.  It  is  impossible  without  further 
k»<?wledge  about  the  area  to  handle  this  situation  appropr lately , 


became  the  mu*  cmpenent  (wavelength  *  16)  which  in  the  case  of  grass 
i»  undesirable,  in  the  case  of  t»*e  canvas  texture  it  an  essential  part 
of  its  description. 

(unction  P  r)  in  cate  of  grata  and  wood  thowt  similarities  which 
suggests  that  both  of  there  textures  have  iott  noisy,  irregular  backgrounds. 
On  the  other  hand  the  canvas  texture  displays  signficant  peaks  in  low 
frwr*"ney  and,  decreasing  power  in  higher  frequencies. 

lor  wore  detailed  analyses  of  P(r),  one  has  to  separate  the 

different  directionalities.  This  is  what  we  have  folloved  up  In  figures 
9b,  I Ob  and  Jib  and  lie. 

The  last  two  examples  of  texture  of  blobs  and  sand  demonstrate 
the  differences  between  uondlrectiona 1  textures.  In  Tig.  12  and  Ij  are 
function*  P  •)  and  P  r)  of  samples  of  texture  recorded  In  rig.  20 
and  Hg.  .  I  respectively.  Table  *  contains  their  corresponding  parameters. 

The  P  9  is  a  flat  function  In  both  textures  as  to  be  expected. 

P  r }  in  the  case  of  blobs  has  one  signficant  peak,  whereas  in  the  case 
of  sand  P(r)  is  approximately  flat. 


Whe 

descriptor 

BiOIS 

BLOB  -  LIKE 

SAND 

NOISY 

MAX  P« 

82.26 

73-71* 

MX 

13 

13 

WIDTH  9 

3 

3 

DIR 

2.33 

2.35 

RO 

11.31 

5 

% 

60.2 

32.8 

V 

® 

2.72 

2.1*8 

P(r) 

120.1*6 

75.8 

V 

MX 

3 

6 

WIDTH  r 

6 

12 

Mr 

61.70 

3L.lt 

Vr 

6.52 

3. 18 

C 

10 

5 

V 

;».6 

1.3 

We  »u>c  take  tone  comments  about  the  differences  between 
continuous  and  finite  discrete  Fourier  transforms.  The  continuous 
Fourier  transforsi  exists  for  every  function  with  finite  energy,  while  the 
finite  discrete  Fourier  iranafon*  exists  for  any  function.  Ovr  interpretations 

will  be  based  on  the  continuous  transforsi  and  the  actual  computations  on  the 
discrete  transform  (fast  Fourier  transform).  The  discrete  transform  is 
really  a  Fourier  scries.  A  continuous  Fourier  transform  is  rotationally 


M 


o 


G 


G 


'  S 


Invariant  (except  for  windowing  cffecta)  while  a  discrete  transform  has 
distinguished  axes  along  the  coordinate  axis  and  the  diagonals.  Thus 
a  directional  Image  has  a  continuous  Fourier  transform  In  a  very  narrow 
band,  while  the  discrete  transform  has  a  narrow  band  transform  only  for 
directions  along  the  preferred  axis.  There  Is  a  corresponding  difficulty 
In  defining  fan  filters  which  we  have  not  succeeded  In  solving.  The 
difficulty  with  narrow  fan  filters  la  demonstrated  In  the  following 
example,  a  line  with  directionality  6  ■  22  1/2°  in  digitized  form, 
with  a  window  size  of  8  x  8  points.  Due  to  the  sampling  problem  the 
line  Is  represented  by  only  four  points  instead  of  the  desired  8  points. 

The  values  of  the  corresponding  power  spectrum  are  In  matrix  2.  From 
inspecting  the  values  In  matrix  2  it  is  clear  that  there  la  a  spread 
of  energies  In  different  directions  besides  the  expected  direction 
9'  -  112  1/2°.  This  effect  is  due  to  poor  sampling.  For  more  details 
see  Huang  (1970). 


MATRIX  1  f(x,y)  MATRIX  2  P(n,m) 


a 

m 

0 

0 

0 

0 

0 

0 

If 

0 

0 

4 

0 

0 

0 

C 

0 

0 

0 

0 

0 

0 

0 

0 

1 

5 

1 

2.6 

1 

1 

1 

1 

2.6 

2.6 

0 

0 

0 

0 

0 

1 

0 

0 

2 

l 

0 

0 

4 

0 

0 

0 

4 

0 

0 

0 

1 

0 

0 

0 

0 

1 

2.6 

1 

1 

2.6 

2.6 

1 

1 

2.6 

0 

1 

0 

0 

0 

0 

0 

0 

c 

0 

0 

0 

0 

4 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-1 

2.6 

2.6 

1 

1 

2.6 

2.£ 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

-2 

0 

4 

0 

0 

0 

4 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-3 

1 

2.6 

2.6 

1 

1 

1 

1 

2.6 

-4 

-3 

-2 

-1 

0 

1 

2 

3 

All  the  real  values  In  F(n,m)  have  to  be  divided  by  coefficient  64. 


56 


0n*  Should  make  a  note  of  a  fairly  important  though  elementary 
mathematical  fact,  namely  that  the  Fourier  transform  does  not  preserve 
functional  restriction.  More  specifically,  if  g(x,y)|W  denotes  the 
restriction  of  the  image  function  g(x,y)  to  a  window  W  (so  that 
g(x,y)  is  truncated  outside  W),  then 

F(g(x,y)|W]  -  F[g(x,y)]|w 

is  true  for  every  W  only  when  g(x,y)  is  periodic  with  period  equal 
to  the  size  of  W.  Thus  a  Fourier  image  of  a  truncated  function, 
truncated  outside  a  window,  will  in  general  depend  also  on  the  part  of 
the  function  g(x,y)  whose  domain  is  outside  W.  What  this  means 
practically  is  that  certain  texture  elements  could  be  split  in  half  by 
windowing  and  as  a  consequence,  an  improper  interpretation  would  be 
derived.  This  problem  can  be  partly  compensated  for  by  overlapping  windowing. 

Human  perception  allows  us  to  discount  smooth  changes  in  shading. 

This  fact  allows  us  to  separate  shading  from  edges.  The  Fourier  transform, 
on  the  contrary,  reflects  not  only  edges,  but  also  slow  changes  which 
are  ignored  in  human  visual  perception.  Perhaps  the  simplest  way  of 
demonstrating  this  is  by  recalling  the  basic  dictionary  of  the  Fourier 
transform.  We  find  that  a  rectangular  impulse  is  transformed  into  a 
sine  function,  a  triangular  impulse  into  a  sine2  function,  and  a  cosine 
signal  is  transformed  ir co  two  impulses.  We  are  accustomed  to  regarding 
images  in  terms  of  homogeneous  regions  with  sharp  boundaries,  and  to 
describe  elements  by  brightness  and  color  contrast  and  outline  shape. 

In  the  Fourier  domain,  these  become  jumbled  in  a  way  that  is  only 
approximately  resolved  by  our  heuristics;  thus  they  are  not  always 
usefully  described. 


-n  addition,  the  texture  elements  (their  shape)  and  their 
organization  are  also  jumbled  together  in  the  Fourier  domain.  So,  for 
instance,  dots  and  small  segments  of  lines  organized  in  parallel  lined 
fashion,  will  be  described  equally  as  monodirectional  texture.  Thus 
they  are  not  described  in  full  details.  As  we  said  before,  for  more 
detail,  one  has  to  apply  the  spatial,  local  operators. 

For  areas  of  a  scene  for  which  homogeneous  regions  are  too  small 
for  use  of  usual  edge-finding  and  region -growing  techniques,  the  Fourier 
transform  provides  useful  and  compact  descriptors.  Many  of  the  examples 
of  textured  regions  showed  linear  texture  elements,  crudely  aligned, 
and  with  roughly  uniform  size  and  spacing.  These  shape  descriptors 
have  natural  counterparts  in  the  Fourier  domain.  Directionality  in  the 
spatial  domain  corresponds  to  a  directional  transform,  and  uniform  spacing 
corresponds  roughly  to  dominant  frequencies  in  the  Fourier  domain.  However, 
a  much  more  common  uniformity  in  the  spatial  domain,  cons'  nt  size 
elements  randomly  distributed,  does  not  have  a  clear  counterpart.  There 
has  been  much  oversimplification  of  the  use  of  the  frequency  spectrum. 

In  reality,  it  appears  as  though  it  has  very  restricted  utility,  however 
that  utility  corresponds  to  a  few  descriptors  which  have  primary 
importance  in  human  perception.  Since  most  descriptors  are  spatial 
domain  descriptors  not  directly  related  to  the  transform,  frequency 
domain  techniques  are  quite  limited.  Nevertheless,  the  proper  combination 
of  the  Fourier  descriptor  together  with  the  spatial  domain  technique  is 
the  suggested  approach  for  a  texture  identifier.  The  Fourier  technique 
will  compactly  describe  large  areas  with  repetitive  features.  The 
description  will  contain  some  characteristics  of  the  shape  of  the  elements 


60 


and  their  organization  as  directional  opposed  to  nondirectional .  It 
will  fail  to  detect  some  detailed  description  of  the  shapes  of  elements; 
as  well  the  Fourier  technique  cannot  be  very  local.  So  the  spatial 
technique  can  complement  the  Fourier  technique,  being  more  local  and 
therefore  more  accurate  in  some  sense. 

All  this  concerns  black  and  white  pictures.  In  colored  pictures, 
each  point  is  represented  by  at  least  a  three-dimensional  real  vector; 
the  coordinates  could  represent  either  the  brightness  through  red, 
green  and  blue  filters  (possible  other  filters),  or  their  normalized 
values  (R/R  +  G  +  B,  G/R  +  G  +  B,  B/R  +  G  +  B) ,  or  perhaps  the 
chomatic  triple  (hue,  brightness,  saturation). 

It  appears  that  color  ,is  a  local  property,  meaning  that  the  color 
is  determined  by  local  contrast  (with  global  constancy  judgment).  The 
Fourier  transofrm  is  an  integral  operator,  that  mixes  up  different 
local  properties  consequently.  Direct  application  of  the  Fourier  texture 
operator  on  an  area  is  not  useful  for  color  in  the  general  case,  however, 
under  certain  constraints,  one  can  suggest  some  applications  of  the 
Fourier  operator  on  colored  textures. 

The  simplest  case  is  when  the  color  is  constant  and  the  texture 
is  encoded  in  the  brightness  function.  Examples  are  grass,  water,  brick 
wall,  etc.  In  this  case  the  Fourier  operator  is  used  in  the  same  way  as 
in  the  black  and  white  picture. 

The  second  case  is  when  the  texture  is  formed  by  only  two 
alternating  colors.  Here,  let  us  assume  the  representation  of  the  color 
of  a  point  as  a  vector  whose  coordinates  will  contain  the  brightness 
of  the  point  taken  through  red,  green  and  blue  filters  respectively. 


Since  we  have  only  two  colors,  clearly,  the  brightness  functions  will 
be  correlated  or  anti-correlated  with  each  other.  Fourier  analysis  of 
functions  could  give  a  reasonably  good  description  of  the  texture  in 
terms  of  the  contrast  of  the  color  components.  This  is  analogous  to 
spatial  domain  analysis. 

This  discussion  points  out  a  crucial  weakness  of  Fourier  transform 
techniques  in  dealing  with  color. 


3*3  Concrete  Texture  Descriptors:  Local  Descriptors 

According  to  our  theme,  a  texture  is  characterized  by  a  structure 
of  texture  elements  and  their  spatial  distribution.  Each  descriptor  is 
associated  with  a  procedure  and  a  set  of  geometric  measures. 

The  descriptors  may  be  derived  from  parameters  that  come  from  the 
spatial  and/or  Fourier  domain.  In  fact,  often  we  will  have  to  deal 
with  two  different  measurements  of  the  same  parameters  (e.g.,  length, 
width,  direction),  one  performed  in  the  spatial  domain  and  the  other  in 
the  Fourier  domain.  Here  we  seek  a  common  interpretation  of  these 
measurements . 

The  input  data  from  which  we  derive  the  local  Fourier  parameters 
is  the  power  spectrum  of  the  picture  over  every  window.  Since  we  are 
able  to  describe  only  what  we  measure,  the  technique  that  we  implement 
will  determine  the  system  of  descriptors  we  can  use.  In  particular,  the 
technique  of  Fourier  analysis  leads  to  the  following  system  of  descriptors: 
m  p  nod  i  r  e  c  t  io  na  1 ,  bidirectional ,  blob-like .  homogeneous .  and  random.  Using 
the  input  data  of  a  local  area  one  may  expect  to  have  more  than  just 
one  descriptor. 

Next  we  discuss  the  particular  types  of  local  descriptors  we  shall 
be  using  in  our  work. 

(a )  Monodirectional  Texture 

In  the  spatial  domain,  a  monodirectional  texture  is  approximately 
invariant  along  some  direction.  An  example  of  monodirectional  texture 
is  a  system  of  parallel  stripes.  In  the  Fourier  domain  the  spectrum 
is  approximately  zero  along  the  direction  of  near  invariance,  and  is 
concentrated  along  the  direction  normal  to  that.  We  take  this  description 


to  be  adequate  for  spatial  domain  elements  with  some  curvature  or 

superimposed  on  a  non-directional  background.  It  makes  sense  to  describe 

as  directional  a  spectrum  in  which  the  dominant  energy  is  along  one 

direction,  and  where  the  directional  peak  is  narrow. 

Next  we  proceed  to  give  a  qualitative  description  of  an  algorithm 

that  provides  monodirectional  descriptors.  As  alluded  to  above,  this 

algorithm  is  based  on  the  assumption  that  the  texture  will  show  concentration 

of  energy  in  a  certain  direction  of  the  Fourier  domain.  Thus  we  want  to 

find  a  peak  in  the  function  of  energy  vs  angle.  This  function  is  a 

sum  of  energies  over  a  fan  with  a  certain  angle  y  and  direction  cp. 

Remember  that  the  data  structure  is  a  matrix,  and  thus  only  four 

directionalities  (horizontal,  vertical,  and  the  diagonals)  coincide  with 

the  matrix  unit  invariant  direction.  The  fan  technique  permits  one  to 

include  also  the  points  near  the  investigated  direction.  The  peaks  of 

the  function  are  defined  as  its  local  maxima,  greater  than  the  average 

value  of  the  directional  energy  function.  The  width  of  a  peak  is  defined 

as  the  distance  between  two  consecutive  zero  crossings  of  the  directional 

energy  function  minus  the  average  value.  The  algorithm  used  two  additional 

parameters,  namely,  y  and  ^dir  >  w^*ere  letter  must  be  greater 

E  -  E  , 

,  * 
or  equal  to  2,  while  the  former  should  not  be  greater  than  -  .  The 

10 

angle  y  is  the  measure  of  width  of  the  peak  and  its  threshold  value 

corresponds  to  the  limit  of  a  useful  directional  description.  E  is 

dir 

the  energy  in  the  angular  stripe  (fan)  and  E  is  the  total  energy.  Its 
threshold  corresponds  to  the  condition  that  the  ratio  length  ridth  must 
be  at  least  two. 


64 


The  algorithm  determining  the  descriptor  derived  in  the  Fourier 


domain  is  given  below: 


Algorithm  Monodlrect tonal : 

(1)  Form  a  function  Pf(<P) 

s  ize . 


WD/2 
r  1 


P(r  ,<p) ,  where  WD  is 


the  window 


(2)  Find  the  number  n  of  peaks  of  the  function  Pr(®)* 

(3)  If  n  ■  1,  then  check  the  magnitude  of  the  peak 

Itax  (Pr(<p))  -  Edir, 

and  go  to  step  (4)  else  mark  the  window  by  message:  '‘There  is  more 
than  one  direction,  do  further  analysis",  and  go  to  the  end. 

(4)  if  E  /  (E-E.,  )  >2,  then  check  the  width  of  the  peak  which 

'  '  dir  dir'  “ 

corresponds  to  the  angular  strip  y  and  continue  in  step  (5)  else  mark 

the  window  by  the  message:  '‘There  could  be  blpb-like  or  a  noisy  texture 

| 

here,  do  further  analysis",  and  go  to  the  end" 

(5)  If  y  <  TT/10,  then  mark  the  window:  "Monodirectional  texture" 
and  go  to  the  end  else  mark  the  window:  "it  is  a  monodirectional 
texture  with  nondirect ional  components"  and  go  to  the  end. 

(6)  End. 


The  power  spectrum  along  the  direction  of  maximum  power  is  the 
power  spectrum  normal  to  the  invariant  direction.  In  the  spatial 
domain  humans  characterize  these  profiles  by  step  functions. 

In  the  Fourier  domain  we  can  find  the  approximate  wave  length 
of  parallel  strips  (distance  between  two  neighboring  stripes),  and 
the  width  of  stripes  from  our  previous  analysis.  One  way  of  identifying 
width  in  the  spatial  domain  would  be  to  use  one-dimensional  interval 


Analysis  along  a  <ilrection.  This  technique  could  be  used  also  for  mo, 
precise  localisation  of  monodl reel  Iona!  textured  edges  tlwin  one  ccr 
achieve  in  the  Fourier  domain.  The  Interval  analysis  method  has  not 
yet  been  Implemented. 

The  above  algorithm  has  been  implemented  and  tested  on  examples. 
A  sample  Is  shown  in  Fig.  14  and  Fig.  15. 


66 


in  Fig.  wc  have  a  texture  of  parallel  lines  and  in  Fig.  15  we 
have  a  textuve  of  parallel  strips  (wood  grain).  In  both  figures  the 
upper  left  pictures  show  the  original  textures,  divided  into  four 
windows  (each  window  is  of  site  jJ2  by  j P  points).  The  pictures  in  the 
upper  **lght  corner  arc  resynthesized  textures,  produced  according  to 
the  deicription.  The  pictures  in  the  lower  left  corner  show  the  power 
spectrum  of  the  original  textures.  Note  the  two  different  directionalities 
in  the  lower  quadrants  of  the  picture.  Here  the  diagonal  directionality 
corresponds  to  wood  grain  pattern  and  the  vertical  directionality 
represents  the  shading  effect  (slow  changes  in  brightness). 

(b)  Bidirectional  Texture 

The  descriptor  "bidirectional"  is  associated  with  two  sets  of 
monodircctional  stripes,  described  in  the  monodirectional  texture.  This 
description  belongs  to  the  spatial  domain  and  docs  not  have  a  unique 
Fourier  counterpart.  In  terms  of  the  power  function  vs.  angle  (tp ) 
it  corresponds  to  two  distinguished  peaks  of  Pr(o),  while  the  converse 
is  not  true. 

If  function  Pr (co )  for  <p  from  <0,Tf>  has  two  distinguished  peaks, 
then  it  could  represent  at  least  one  of  the  following  two  cases  in  the 
picture  (its  window): 

1 

(a)  two  different  directional  textured  subregions  are  adjacent 
(arc  next  to  each  other)  in  one  window,  or 

(P)  two  different  directional  regions  are  superimposed  (one  is 
on  the  top  of  the  other). 

The  problems  discussed  above  are  shown  in  Fig.  16  and  Fig.  17  and  18 
where  the  pictures  in  Fig.  16  show  the  case  (a)  and  the  picture  in 


67 


a 


FIr.  17  and  10  exhibit*  the  cate  (|). 


O 


3 


8 


I 


ft 


ft 


I 


» 


68 


1 


liv  l 


J 


Each  ol  the  ti);ures  displays  nine  pictures 
exp  la  ineu  in  the  table  below,  where  the  raw  anil 
particular  pictures  in  I' i .  1  . 


whose  meaning  is 
co  1  u:.  mn  .be  rs  re  t  e r 


to 


♦ 


t 


process,  performed  in  every  window  separately. 


The  "complement"  of  <1,2>. 


The  power  spectrum  of 


The  phase  spectrum  of  <1,1>.  Here  the  phase  is 
transformed  from  the  range  <-TT,TT>  to  <0,2Tf>. 


3 


The  absolute  values  of  the  phase  spectrum  of  the 
picture  <1,1>. 


The  power  spectrum  of  picture  <1,1>  parametrized 
by  the  absolute  value  of  the  phase  in  range  <0,iT/3>. 


The  same  as  in  <5,1>  but  this  time  with  range 

<rr/3,  2tt/3>. 


The  same  as  in  picture  <5»2>  but  with  range 
<2TT/3,Tr>. 


Fifi.  b 


Fig.  17 


The  description  of  pictures  in  Fig.  17  is  the  same  as  that  of 
pictures  in  Fig.  1H,  except  in  Row  1  and  Column  1,  where  we  have  four 
windows,  each  containing  u  superposition  ol  horizontal  and  vertical  lines. 

Let  us  concentrate  for  a  moment  on  the  windows  of  the  first  and 
third  quadrant  of  picture  <1,1>  in  Fig.  17.  Each  of  the  windows 
is  a  composition  of  horizontal  line  textures  and  vertical  line  textures. 

It  is  impossible  to  distinguish  the  cases  of  separate  (a)  from  overlap 
(3)  in  the  power  spectrum.  Using  the  phase  spectrum  one  would  hope  to 
separate  the  region  containing  the  horizontal  lines  from  the  region 
containing  the  vertical  lines,  or  one  would  at  least  hope  to  be  able 
to  identify  their  positional  relationships.  Unfortunately,  it  is  not 
known  at  present  how  to  carry  out  the  separation.  To  our  knowledge,  no 
one  has  yet  used  the  phase  spectrum  in  a  meaningful  way. 


Fig.  1  ) 


» 


i 


» 


The  figures  in  Fig.  19  show  the  vector  display  of  the  complex 
function  F(n,m)  in  a  direction  in  our  case  it  is  in  horizontal 
and  vertical  direction  for  each  <n,m>.  The  direction  of  the  vector  is 
equal  to  the  phase,  and  the  length  of  the  vector  corresponds  to  the 
value  of  the  power.  As  one  can  see  from  the  pictures,  there  is  no 
evident  distinctive  feature  which  would  describe  the  relationship  Left- 
Right  or  Right -Left. 

We  have  shown  above  that  in  spite  of  the  nonuniqueness  of  the 
representation  of  bidirectional  textures  in  the  power  spectrum,  using 
decomposition  techniques,  one  can  construct  a  suitable  algorithm  for 
identification  purposes. 

We  shall  soon  give  such  an  algorithm.  However,  before  we  do  that, 
we  want  to  point  out  that  the  domain  of  validity  of  the  parameters 
associated  with  this  descriptor  is  given  by  the  domain  of  validity  of 
the  parameters  used  for  monodirectional  textures,  except  that  the  lower 
and  upper  boundary  of  d  is  now  changed  from  <0,lT>  to  <V,TT  -y>. 
Moreover,  the  peaks  are  defined  in  the  same  way  as  was  done  in  the 
monodirectional  algorithm. 

Bidirectional  Algorithm: 

(1)  Find  the  number  of  peaks  (n)  of  function  Pr(cp).  If 

n  =  2,  then  find  the  corresponding  directions  and  co,,  for  each 

peak,  else  write  the  measage:  "This  is  not  a  bidirectional  texture, 
do  further  analysis",  and  go  to  the  end. 

(2)  If  9  rr/10  >  Abs(cp^  -  to  )  >  TT/lC,  then  go  to  step  (3), 
else  write  the  message:  "This  is  a  deformed  monodirectional  texture" 
and  go  to  the  end. 


76 


0)  Partition  region  R  into  four  equally  large  subwindows 
Rp  R,, ,  ,  and  R^  . 

(4)  Check  each  subwindow  R,  for  i=l,...,4,  whether  it  is  a 

bidirectional  textured  region  or  not,  using  the  algorithm  "Bidirectional". 

If  the  answer  is  yes,  then  set  MR.  On,  otherwise  set  MR  Off. 

r  i 

(5)  If  MR^  are  On  for  all  i=l , . . . ,4  then  describe  the  given 
window  (region  R)  as  a  "bidirectional  texture"  and  go  to  the  end,  else 
go  to  step  (6). 

(6)  If  MR^  are  Off  for  all  i=l,...,4  and  all  the  subwindows 
are  monodirectional ,  then  describe  the  corresponding  region  as  "Two 
monodirectiona 1  textures  with  different  directions  are  adjacent"  and 
go  to  the  end,  else  issue  the  message:  "Further  texture  localization 
is  necessary"  and  go  to  the  end. 

END. 

(c )  Blob-like  Texture 

This  descriptor  is  associated  with  blobs  and  nonlinear 
distribution.  It  should  be  noted  that  these  two  components  go  together 
to  the  effect  that  it  is  not  sufficient  to  have  blobs  as  texture 
elements  for  inferring  a  blob-like  description.  For  instance,  blobs  on 
a  grid  would  show  a  directional  texture,  and  the  'bloblikeness 1  will  be 
very  weak. 

In  the  Fourier  domain,  blobs  are  represented  by  a  concentric 
energy  distribution.  An  annulus  with  the  greatest  energy  value  is  the 
peak  annulus.  In  the  implementation  of  the  description  (see  the  algorithm 
below)  we  approximate  areas  of  the  transform  by  circles.  The  radius  of 
the  approximating  circle  is  inversely  proportional  to  the  radius  of  the 


77 


ft 


% 


© 


* 


i 


approximating  circle  in  the  spatial  domain. 

It  would  seem  logical  to  pass  from  mono-  and  bidirectional 
textures  to  tri-,  tetra-,  ...,  n-directional  textures,  before  turning 
to  blobs.  However,  it  is  very  hard  to  interpret  these  higher  order 
directionalities  in  the  spatial  domain. 

The  blob-like  algorithm  describes  blobs  and  their  nonlinear 
distribution.  It  is  based  on  the  assumption  that  patterns  which  do  not 
have  directionality,  noise,  nor  homogeneity,  are  some  sort  of  blob-like 
textures.  In  the  Fourier  domain  this  assumption  corresponds  to  two 
conditions,  l'irst,  Pr(<p)  is  constant  and,  second,  P^(r)  is  not 
constant  . 


Algorithm  Blob-like: 


WD/2 


1.  Form  functions  Pr(cp)  =  r  P(r,cp)  and 

r=l 


TT 


V')  -  2_  p(r>9) 

y  cp=0 


and  then  compute  their  respective  mean  values 

2  rr 


M 


M 


WD 


£ 


0 


P  (cp)  and, 


9 

where  WD  is  the  window  size. 

Next  compute  their  variations 


2  WD/2 

5_  P  (r)  » 

WD  r^  1  V 


TT 


v  =w  ^  (pr (9)  -  Mr)  and 


©  =  0 


© 


WD/2 


WD  r  =  1 


<Vr)  ■  v 


V 

'X 


78 


c-  •  If  >  CN  and  >  CN,  then  go  to  step  3  else  print 
the  message:  "The  structure  is  on  the  level  of  the  camera  noise" 
and  go  to  the  END. 

3 •  If  vr  >  CN,  then  go  to  step  7  else  print  the  message: 

"All  energies  are  equally  distributed  in  every  direction",  and  go  to 
step  4  . 


^ •  If  v  >  CN,  then  go  to  step  5  else  print  the  message:  "it 
is  a  noisy  texture"  and  go  to  the  END. 

5.  Find  Max  P  (r)  -  P  (r  ). 

cp  cp  max 

If  rmax  -  2,  then  print  the  message:  "There  is  only  one  texture 

element";  go  to  step  (6). 

6.  Form  a  new  discrete  function  l(i)  from  P  (r)  in  the 

<P 

following  way: 

Assume  that  P^(r)  is  a  combination  of  sinc(r.cp)  type  functions. 

Find  all  the  local  maxima  and  all  the  minima  of  the  function  P  (r). 

<P 

For  every  local  maximum  r  ,  ,  there  are  two  surrounding  minima  r 

max  1  0  1 1  * 

r„ .  such  that 
P  1 

r. .  <  r  .  <  r_ . ,  where 
li  max  1  2i’ 

I(1)  ‘  Vr)- 

r  =  r12 


If  l(i)  is  a  convex  function,  then  print  the  message:  '‘Texture  elements 
are  blob-like"  and  go  to  step  7,  else  print  the  message:  "There  is  an 
unidentifiable  texture"  and  go  to  END. 

7.  Assume  that  Pf (cp )  is  a  combination  of  sine  functions.  Find  all 
the  local  maxima  of  the  function  P^  (cp)  and  if  their  number  is  greater 


79 


than  ,  is  sir  the  mcs  •  age:  '!  hi  !n  i!  -  !  t  it  I  n  nai,  .some  directional 

Tea  t  ure  s  11 .  else  prim  the  i.ir  .  s.i  ;r  :  "'I  u.-  ■  e  n  h  I  oh  -  J  i  ke  texture". 

.  1- Nl) . 

The  above  algorithm  has  beeti  t  o  s !  t  ■  '  o.i  .a  e  ..u.ipl  e  diown  in  Fig.  20 
which  should  hi  sol 1  -explanat  ory. 


(d)  Noisy  ('Random)  Texture 


The  spatial  case: 

A  random  distribution  of  dots  (pepper  and  salt  pattern)  forms 
a  model  of  noisy  texture.  This  model  describes  the  random  spatial 
organization  of  dot  -  texture  elements  versus  periodic  or  regular 
distribution  of  texture  elements. 

Fourier  Case: 

The  texture  in  this  model  corresponds  to  a  homogeneous 
distribution  of  energies  in  the  power  spectrum. 

Descriptor  "noisy"  is  associated  with  certain  parameters 
(obeying  some  threshold  constraints),  explained  below: 

EL  will  denote  the  ratio  of  the  size  of  the  one-dot-texture 
element  and  the  size  of  a  real  texture  element.  The  inequality 
EL  <  WD/4  +  CN  means  that  a  texture  element  with  area  of  two  dots 
and  WD  (window  size)  of  8x8  points  will  still  be  a  dot-texture  element. 

ED  is  the  parameter  of  random  distribution.  ED  is  the  ratio 
of 


EM  and  M  ,  where 
r 

EM  =  MAX  Abs (P  (®)  -  M  )  and  M 
cd  r  r '  r 


2 

WD 


The  value  of  ED  is  set  to  be  ED  <  0.1  +  ON. 

CN  is  the  noise  of  the  TV  camera. 

Algorithm  Random: 

1.  Form  functions  P  (cp),  P  (r),  M  ,  M  ,  v  ,  and  v  as  they 

r  cp  r  cp  r  ■  cp 

were  described  in  the  algorithm  blob-like. 

2.  If  M^  <  CN  or  M  <  CN,  then  write  the  message:  1 


’The  texture 


o 


o 


(J 


o 


o 


o 


structure  is  on  the  TV  camera  noise  level,  check  if  it  is  a  homogeneous 
texture"  and  go  to  END,  else  go  to  step  3* 

3.  If  vr  >  ED,  then  write  the  message:  "There  is  no  random 
distribution"  and  go  to  END,  else  go  to  step  4. 

4.  If  v  >  EL,  then  write  the  message:  "There  might  be  a  blob- 
like  texture"  and  go  to  END,  else  write  the  message:  "The  texture  is  a 
randomly  distributed  dot  pattern". 

END. 

The  algorithm  has  been  tested  by  an  example  shown  in  Fig.  21  .  The 
picture  in  the  left  upper  corner  is  a  texture  of  sand;  the  picture  in 
the  right  upper  corner  is  the  resynthesized  image  of  the  original,  according 
to  the  description.  Finally,  the  left  lower  corner  shows  the  power 
spectrum  of  the  original. 


(e )  Homogeneous  Texture. 


A  homogeneous  region  of  uniform  brightness  (in  black  and  white 
picture),  color  (in  colored  picture)  forms  a  model  of  a  homogeneous 
texture . 

The  Fourier  counterpart  is  represented  by  a  Dirac  function 
with  its  center  in  the  zero  point  of  the  coordinate  system. 

The  only  threshold  parameter  in  this  model  is  the  level  of 
the  TV  camera  noise  (CN). 

Algorithm  Homogeneous 

1.  Form  a  function 

TT  WD/2 

noise  (r.cp)  =  2  P(r,cp)  -  P(0,0) 

cp  =  0  r  =  0 

P(0,0)  is  also  called  the  DC  value. 

If  noise  (r,cp)<  CN  then  write  the  message:  "The  texture  is  homogeneous" 
else  write  the  message:  "The  texture  is  not  homogeneous". 

END. 

An  additional  parameter  -  the  average  value  of  the  intensity 
of  light  over  a  particular  window  is  associated  with  every  description 
of  a  homogeneous  texture. 


The  following  table  sunmarizes  the  texture  descripto 


rs  we  have 


implemented ; 


Table 


DESCRIPTORS 


PARAMETERS 


Monodirectional  DIR  -  direction  of  lines 


ww  -  distance  between  two  parallel  lines 
Y  a  measure  of  straightness  of  a  line 

EDIr/  (E"EDIr)  L 

t  -  a  measure  of  "thickness"  of  a  line 


(Bidirectional 


Jloblike 


DIRj,  DIR^  - 

directions  of  lines  1^  lg ,  respectively. 
wwis  ww2  - 

distances  between  two  parallel  lines  in  two 
different  directions  DIR,  and  DIR 

1  2 

V  Y2  " 

measures  of  straightness  of  lines  1^  l 

tl»  “  measure  of  "thickness"  of  lines 
lj  >  12»  respectively, 
d  =  DIR  -  DIR^ 

Comment:  lines  ^  and  lg  are  assumed  to  be 
nonparallel . 

R  -  the  distance  between  two  texture  elements  in 
direction  DIR. 


84 


Continued 


4 .  COLORED  AND  TEXTURED  REGIONS 

In  the  previous  chapter,  we  discussed  procedures  for  texture  descriptor?-. 
This  chapter  describes  the  determination  of  textured  and  colored  regions 
and  introduce*  a  mathematical  description,  topological  sheafs,  to  formalize 
the  region-forming  process.  The  texture  descriptors  are  used  to  form 
regions  with  similar  descriptors.  The  region -growing  is  low-level  in 
that  it  does  not  use  the  context  of  a  world  model.  It  is  intended  as  a 
tool  for  higher  level  routines.  The  proposed  regions  function  as  initial 
guesses  about  important  areas  of  the  image.  Thus,  the  routines  favor 
large  regions  at  the  expense  of  smaller  regions,  a  sort  of  "law  of  the 
fishes",  the  big  ones  eat  the  smaller.  Since  there  are  few  useful 
texture  descriptors  and  organization  procedures,  this  attention  to  low 
level  modules  was  a  necessary  focus  for  our  research. 

The  informal  distinction  between  low-level  and  high-level  processes 
refers  to  the  context  which  the  process  takes  into  account.  Roughly,  we 
mean  low-level  when  the  context  is  local  and  based  on  the  image,  and  by 
high-level  we  mean  an  object  space  interpretation  which  depends  on  several 
levels  of  abstraction  and  relations  (global).  We  would  like  a  larger 
armament  of  texture  descriptors  and  low-level  organization  mechanisms. 

However,  it  is  important  to  have  a  balance  between  the  low-level  and 
higher-level  systems,  and  to  design  for  their  communication. 

We  emphasize  that  we  use  a  technique  where  we  start  from  large  windows 
and  take  smaller  windows  at  boundaries.  This  approach  has  a  limitation 
of  missing  substructure.  The  microscopic  approach  of  starting  from  small 
windows  and  trying  to  piece  together  global  structure  has  a  complementary 
weakness  of  missing  global  order.  Overall,  we  think  this  points  up  the 
need  for  a  range  of  sizes  for  local  organization.  For  texture,  we  prefer 


86 


our  approach  to  the  microscopic  one. 

As  the  scanner  traverses  across  the  picture  in  a  television-like  raster 
scan,  the  local  texture  descriptors  (these  descriptors  might  be  spatial, 
or  histogram,  etc.,  in  addition  to  those  we  use) over  each  window 
are  sent  to  the  program  which  detects  the  appearance  of  similarities  or 
dissimilarities  of  the  structures,  over  the  given  pair  of  windows.  The 
knowledge  of  the  existence  of  similarities  is  retained  together  with 
locations.  All  the  windows  with  similar  structures  are  joined  together 
by  a  two-way  list  which  is  constructed  during  the  scanning  process.  The 
program  also  detects  the  break  of  similarities  between  two  structures  and 
gives  a  command  to  the  scanner  to  scan  with  windows  of  smaller  size. 

When  two  windows  are  joined  or  split  apart,  different  texture  names 
are  assigned  to  them.  Each  structure  associated  with  a  window  is  tested 
to  determine  its  similarity  with  other  structures  or  its  proper  association 
with  the  existing  similarity  classes. 

Region  boundaries  do  not  usually  coincide  with  the  grid  windows,  and 
hence  there  occurs  both  merging  of  two  adjacent  areas,  and  the  splitting 
of  an  area  into  at  least  two  portions. 

In  this  work  a  set  of  real  life  and  artificial  pictures  was  scanned 
and  processed  by  our  program  to  demonstrate  the  capabili ty  of  the  implemented 
Fourier  method.  The  results  of  testing  indicate  that  our  method  is 
capable  of  decomposing  pictures  into  regions,  where  each  region  corresponds 
to  a  different  texture  or  color. 

In  our  implementation  of  region  growers,  the  emphasis  was  on  testing 
some  of  the  ideas  and  not  on  the  efficiency  of  programming.  However,  for 
illustration  we  present  in  Table  6  the  average  time  and  memory  load  for  our 


87 


programs.  The  programs  have  been  implemented  on  a  PDP  -V  ,  at  the 
Artificial  Intelligence  Project,  Stanford  University. 


Table  6 


NAME  AND  FUNCTION 
OF  THE  PROGRAM 


SIZE  OF  THE  CPU  TIME  (min)  CORE  (k) 

PICTURE 


FANAL.SAI 

FAST  FOURIER  TRANSFORM 
AND  SEGMENTATION 


256  x  128 


U 


EXTUR.SAI 
EXTURE  ANALYSES 
ON  WINDOWS  (52  x  32) 
POINTS 


256  x  128 


MIKROA.SAI 

LOCALIZATION  TEXTURE 
ANALYSIS  OF  WINDOWS 

(8  x  8) 


256  x  128 


'REE  .'SAI  256  x  128 

'EXTURE  REGION  GROWER 


COLOR. SA I 

COLOR  REGION  GROWER  I92  x  128 


4 . 1  An  Algorithm  for  Finding  Regions 

The  process  of  localization  of  structures  was  described  in  detail  in 
Section  3»3»  Here  we  shall  focus  our  attention  on  finding  the  connections 
between  local  structures  in  terms  of  continuity,  discontinuity,  and 
proximity.  The  actual  job  to  this  effect  is  carried  out  by  a  region  grower 
that  we  shall  describe  momentarily.  The  region  grower  can  be  used  both  for 
continuous  textured  regions  and  continuous  colored  regions.  The  algorithms 
for  our  region  grower  use  the  principle  of  local  constancy  whose  content 
is  summarized  in  the  phrase:  "Unite  connected  locally  similar  areas  into 
one  global  one."  Our  algorithm  uses  the  notion  of  a  cell  which  is  nothing 
but  an  arbitrary  window  of  the  smallest  possible  size,  carrying  meaningful 
information. 

Algorithm  "Region  Finder" 

1.  Set  regional  index  i  to  1  and  produce  a  mark  . 

2.  Take  the  first  untested  cell  and  call  it  the  first  pilot  cell 
(which  thereby  is  also  a  pilot  cell). 

J.  Set  XSIDE  to  be  RIGHT  SIDE,  YSIDE  to  be  LEFT  SIDE,  and  XADJ  to  be 
RIGHT  ADJACENT. 

4.  If  the  pilot  cell  has  bean  tested  for  its  XADJ  cell,  then  go  to 
step  8,  otherwise  mark  the  pilot  cell  by  a  mark  signifying  the  fact  that  it 
has  been  tested  on  its  XSIDE,  and  continue  in  step  5» 

5.  Find  the  next  XADJ  cell.  Ask  whether  this  new  cell  does  not  exceed 
the  size  of  the  picture  and  has  not  been  tested  on  its  YSIDE.  If  the 
answer  is  NO,  continue  in  step  6,  else  go  to  step  8. 

6.  If  the  pilot  cell  and  the  adjacent  cell  are  similar,  then  continue 
in  step  7,  else  mark  the  pilot  cell  on  its  XSIDE,  indicating  that  it  has 
been  tested,  and  go  to  step  8. 


o 


o 


o 


o 


o 


c 


o 


o 


u 


89 


!■  Join  the  two  cells  (pilot  cell  and  the  new  cell),  mark  the  new 
cell  by  a  mark  R,  and  indicate  the  fact  that  it  has  been  joined  on  its 

VSII.K.  store  the  new  cell  in  an  array  of  new  cells.  Make  the  cell  a  pilot 
cell  and  go  to  step  8. 

8.  If  XSIDE  is  the  RIGHT  SIDE,  then  set  XSIDE  to  be  LEFT  SIDE,  YSIDE 
CO  be  RIGHT  SIDE,  and  XADJ  to  be  LEFT  ADJACENT  and  go  to  step  9.  If  xsIDE 

is  the  LEFT  SIDE,  then  XSIDE  is  set  to  be  the  UPPER  SIDE,  YSIDE  is  set  to 
be  LOWER  SIDE,  and  XADJ  is  set  to  be  UPPER  ADJACENT,  and  go  to  step  9.  If 

XSIDE  is  the  UPPER  SIDE,  then  set  XSIDE  to  be  LOWER  SIDE,  YSIDE  to  be  UPPER 
SIDE,  and  XADJ  to  LOWER  ADJACENT,  and  go  to  step  10. 

9.  Set  the  pilot  cell  to  be  the  first  pilot  cell  and  go  to  step  4. 

10.  Take  the  array  of  new  cells.  Take  the  index  J  (initially  J  ,  0) 
and  increase  it  by  1.  If  j  exceeds  the  number  of  all  new  cells,  then  go 

to  step  11,  else  take  the  element  n^  Iron  the  array  of  new  cells  and  make  it 
the  first  pilot  cell  and  go  to  step  3. 

11'  2er0  Che  *rray  o£  ne"  cells-  If  there  is  any  cell  in  the  picture 
that  has  not  been  yet  tested,  then  increase  the  index  of  regions  1  by  1, 
make  a  new  mark  R,^  and  go  to  step  a,  else  go  to  the  end. 

END. 


90 


4 .2  Texture  Regions 

This  algorithm  has  been  tested  on  textured  regions  as  well  as  on  colored 
regions.  The  scanning  process  is,  for  instance,  shown  in  Fig.  22  with 
white  squares,  each  representing  ,winck>w«-  e  f  *x  J2-  points.  Fig.  22 

displays  the  boundaries  of  different  textured  regions  of  the  picture 
shown  in  Fig.  23,  after  the  first  pass.  One  can  see  Lhe  different  sizes 
of  windows . 

Over  every  window  Lhere  are  several  descriptors  and  parameters. 

Since  we  used  several  window  sizes  (32 ,  16,  8)  and  some  of  the  parameters 
are  size  dependent,  we  reduced  all  descriptors  and  parameters  to  the 
smallest  window  size  (8).  Then  Lhe  criteria  of  similarity  had  to  be  set. 


91 


The  criteria  of  similarity  are  set  by  the  higher  level  program.  In 
our  work  we  used  two  approaches,  not  exclusive  but  rather  complementary. 

One  approach  used  only  black  and  white  pictures  and  did  not  assume 
any  previous  knowledge  about  the  scene.  The  similarity  criteria  were 
determined  by  the  camera  noise  and  expected  error  of  the  method.  The 
whole  region  growing  was  based  only  on  the  similarities  of  certain 
geometric  properties  described  by  the  Fourier  texture  operator.  The 
results  of  this  approach  are  displayed  in  Fig.  2b  and  26,  where  one  can 
see  that  while  this  approach  is  sufficient  for  separating  regions  on 
simple,  more  or  less  artificial  scenes  (the  rastered  cube  on  Fig.  2J, 
the  cube  on  a  grid  surface  in  Fig.  25,  it  is  not  adequate  for  finding 
boundaries  of  regions  of  real  outdoor  scenes.  In  the  latter  case  one 
needs  to  know  more  about  the  scene  and  thus  conduct  a  directed  texture 
region  growing  or  texture  boundary  detection. 

The  directed  texture  region  growing  and/or  boundary  detection  is 
the  other  approach  that  we  used.  It  uses  information  gained  through  a 
color  region  grower.  This  information  directs  the  application  of  the 
textured  operator  for  two  purposes: 

One  is  to  look  for  a  common  texture  where  the  colors  are  the  same  or 
proximal.  The  other  is  to  look  for  texture  differences  where  there  are 
colored  boundaries. 

This  approach  identifies  more  efficiently  the  real  regions  and  their 
boundaries.  The  example  in  Figure  27  shows  the  different  textured  regions 
of  the  original  picture  displayed  in  Fig.  1.  Most  of  the  grass  region 
came  out  as  directional  texture.  Only  two  areas  (one  on  the  left  side  and 


the  other  on  the  right  side)  within  the  grass  region  were  identified  as 
noisy  texture,  though  with  the  same  direction  as  the  directional  textures. 

It  requires  further  verification  of  the  continuity  in  those  two  textures 
in  order  to  remove  the  boundaries. 

The  main  difference  between  the  two  approaches  is  that  in  the  latter 
we  use  the  texture  operator  in  a  directed  way.  This  means  that  as  well  as 
applying  the  texture  operator  only  in  certain  areas  (not  all  over  the  picture), 
we  also  have  the  choice  of  asking  for  continuity  and  proximity  in  several 
descriptors  and  parameters  independently. 


Fig. 


J 


93 


LJ  UUUI.I: 


4  *Z>  Color  Regions 


Similarly,  as  for  textured  regions,  the  region  growing  algorithm  has 
been  used  for  colored  regions.  The  colored  picture  consists  of  three 
files,  each  representing  the  brightness  through  red,  green  and  blue  filters. 
We  use  the  normalized  values  of  color  for  each  point  (e.g.  R/R  +  g  +  B, 

B/R  +  G  +  b)  where  R,B,G  are  the  intensities  through  the  red,  blue  and 
green  filters  respectively.  As  in  the  texture  region  grower,  here  we  use 
again  windows  over  which  the  average  values  of  R/R  +  G  +  B  and  b/r  +  G  +  B 
are  computed.  The  size  of  the  windows  depends  on  the  structure  of  the 
picture  we  have  chosen  (8x8).  The  windows  are  overlapped,  so  that 
continuity  is  checked  strictly.  The  threshold  value  that  determines  the 
similarity  criterion  depends  on  the  resolution  of  the  picture  as  well  as 
on  the  window  size.  In  our  case,  it  is  set  to  2,  provided  that  we  deal 
with  6  bit  pictures.  The  example  in  Fig.  44  shows  the  result  of  the  above 
described  color  region  grower,  applied  o-i  the  picture  in  Fig.  1.  The  original 
picture  is  only  4  bits  resolution,  so  the  threshold  has  to  be  different 
(0.75) .  Otherwise  every  thing  is  the  same. 


96 


h  -h  A  Sheaf -Theoretic  Point  of  View  of  Finding  Regions 

The  geometric  analysis  of  pictures,  in  particular,  partition  of  a 
picture  into  regions,  can  be  neatly  presented  in  the  language  of  sheaves 
ffor  details  see  the  APPENDIX).  From  a  sheaf-theoretic  point  of  view, 
the  region  identification  process  is  based  on  an  assignment  of 
structures  to  windows  (the  local  structure)  and  on  passing  from  LOCAL 
STRUCTURES  (over  windows;  to  GLOBAL  ONES  (regions).  Thus,  each  region  is 
specified  by  one  sheaf.  Over  every  window,  we  can  have  several  different 
descriptors,  thereby  different  structures.  Each  of  these  structures  will 
partition  the  picture  in  a  different  way.  These  different  partitionings 
of  the  picture,  described  by  different  sheaves,  correspond  to  the  different 
layers  of  description  of  the  picture.  Naturally,  the  sheaves  could  be 
interconnected  through  some  connecting  mappings.  The  difficulty  in  making 
use  of  the  structure  of  sheaves  in  scene  analysis  is  that  we  usually  do  not 
know  the  connecting  mappings  between  two  different  sheaves. 

The  sheaves  constitute  a  vehicle  for  checking  the  continuity  and 
proximity  of  structures  with  respect  to  some  well  defined  connected  mapping. 
In  a  concrete  application  of  a  texture  region  grower,  this  mathematical 
tool  has  the  following  limitations: 

i)  If  the  structure  is  a  texture,  then  it  will  find  the  continuity 
in  the  texture,  but  it  will  find  discontinuity  in  the  texture  element. 

Thus  the  smallest  window  size  must  be  restricted  to  the  size  of  the  texture 
e lements . 

ii  The  sheaf-theory  assumes  that  the  structures  over  every  two 
windows,  which  are  in  inclusion  relationship,  are  related  by  a  connected 
mapping.  However,  in  reality  the  different  positions  of  windows  may  cause 


97 


false  continuities  or  discontinuities.  One  has  to  do  several 
different  overlapped  windowing  in  order  to  overcome  this  error. 

The  contribution  of  the  sheaf  point  of  view  to  region  growing  is 
that  it  defines  precisely  the  conditions  for  continuity  and  discontinuity 
of  a  structure  with  respect  to  some  connected  mapping.  The  sheaf  theory 
shows  that  if  the  structures  from  two  (overlapped)  windows  and  their 
overlapped  part  are  connected  by  the  mapping,  then  the  union  of  these  two 
windows  is  continuous  with  respect  to  the  structure  and  the  mapping.  It 
is  interesting  that  the  sheaf  conditions  are  similar  to  natural  continuity 
conditions  for  use  of  the  Fourier  power  spectrum. 

In  most  of  our  applications  (texture  or  color  region  grower),  the 
connected  mapping  is  the  local  similarity  relationship  (it  must  be  an 
equivalence  relation).  Naturally,  the  theory  allows  much  more  complicated 
mappings  as  well  as  structures. 

After  this  discussion  let  us  present  the  sheaf-theory  more  formally. 
The  topology  we  shall  use  is  discrete  and  is  induced  by  certain  norms, 
taken  from  the  structure  to  integers.  Once  the  topology  is  fixed,  we 
introduce  a  convenient  system  of  neighborhoods,  called  windows.  We  think 
of  windows  as  a  system  partially  ordered  by  inclusion.  Procedures  which 
evaluate  the  data  over  the  windows  assign  to  every  window  a  structure  of 
descriptors.  When  two  windows ,  say  v  and  w,  are  in  inclusion  relation¬ 
ship  v  C  w,  the  corresponding  networks  of  descriptors  N  and  N  are 

w  v 

related  by  a  connecting  mapping 


which  essentially  restricts  the  network  over  the  bigger  window  to  a  network 


98 


on  the  smaller  window.  Since  the  process  of  restriction  is  transitive, 
one  obtains  by  this  formalization  a  PRESHEAF  associated  with  the  image 
function 

N  =  <N  , ftV> 
viHw 

Sheaves  are  presheaves  satisfying  additional  axioms.  A  definition  of 
a  sheaf  in  its  full  generality  requires  several  additional  technicalities. 
A  more  direct  definition  of  a  sheaf  with  a  fairly  clear  picture-theoretic 
interpretation  is  given  below. 

Thus,  loosely  speaking,  a  sheaf  is  a  system  of  structures  over  a 


lattice  of  windows,  where  each  structure  represents  one  particular  texture. 

Consider  a  presheaf  S  =  [Syj  (3  y}  of  structures  over  a  cellular  space 

X,  i.e.,  on  the  lattice  of  subsets  <Sub(x) ,  c  >.  Then  S  is  a  sheaf  over 

X  precisely  when  for  any  family  {vjiei]  of  subsets  of  X  with  V  =  U  Vj, 

i 

the  following  two  conditions  are  satisfied; 


(1)  Uniqueness  axiom:  Vi[  fL  (s')  =  (s'')]  =>  s'  =  s"; 

i  Vi 

(2)  Existence  axiom;  Vi , j[  py _  i  (s.  )  =  pJ^/Sj)]  => 

sVk  [  ^(s)  =  sk]  , 

where  s  ,s  '  ,s  "eSy  ,s  eS  ,s  eS  ,  s^S  ,  and  i,j,kel. 

i  J  j  k 

The  condition  (1)  says  that  if  the  structure  elements  s  are  locally 
identical,  then  they  are  also  globally  identical.  That  is  elements  are 


uniquely  determined  by  local  data. 

The  condition  (2)  says  that  if  we  have  local  data  which  are  compatible, 
they  actually  "patch  together"  to  form  global  data. 


The  geometric  meaning  of  axioms  (l)  and  (2)  is  displayed  below  in  Fig. 
28  and  Fig.  29. 


99 


5*  interpretation  of  outdoor  scenes. 


The  main  issue  in  this  chapter  is  how  to  recognize  and  interpret 
real  outdoor  scenes  of  grass,  water,  sky  ,  etc. 

5 Pattern  Recognition  Approach. 

In  an  early  stage  of  our  research,  we  tried  to  recognize  texture 
using  a  pattern  recognition  method  (Bajcsy,  1970) .  We  computed  a  function 
of  energy  (E)  along  the  frequencies  (f)  and  derived  a  feature  vector  from  thi 
function.  The  features  were  the  number  of  peaks,  their  energies,  their 
width  and  their  corresponding  frequencies.  In  addition,  we  characterized 
the  function  as  flat  or  with  peaks.  These  features  were  used  for  clas¬ 
sification  of  the  texture  into  classes:  grass,  water,  regular  pattern 
(like  blobs,  brick  wall)  and  unidentified.  As  an  example,  the  grass  and 
water  had  more  flat  function  than  the  regular  patterns.  Samples  of  the 
function  of  the  energy  and  the  frequency  of  textures  of  grass,  water, 
brick  wall  and  blobs  is  displayed  in  Figs.  30  -  y$. 


100  -  ^ 


Each  picture  consists  of  two  graphs.  One  is  the  function  (energy, 
frequency)  computed  in  the  window  without  any  preprocessing  (indexed  by 
(a)),  and  the  other  is  the  same  function  as  above  computed  from  the  data 
which  was  preprocessed  (indexed  by  (b)).  Preprocessing,  in  the  case 
of  grass  and  water,  was  a  high  pass  filtering.  The  purpose  of  the 
preprocessing  was  to  eliminate  the  effects  of  shadows  on  grass  or  water. 

For  the  regular  patterns,  the  preprocessing  consisted  of  a  low  pass 
filtering.  The  purpose  of  this  filtering  was  to  enhance  the  main  fre¬ 
quency  components  of  a  regular  patterr  and  suppress  the  noise. 

By  this  method  we  could  distinguish  well  the  regular  patterns  (or 
man  made  patterns)  from  the  natural  textures  encountered  in  outdoor 
scenes.  It  was  more  difficult  to  distinguish  the  water  from  thy  grass 
unless  the  main  frequency  component  was  sufficiently  different.  The 
training  feature  vector  was  extremely  sensitive  to  differences  in  how 
the  picture  was  taken,  in  particular,  the  distance  between  the  observer 
and  the  scene,  and  the  orientation  of  the  observer  (whether  he  is  on  the 
ground  or  in  an  airplane)  with  respect  to  the  scene.  This  method  did 
not  consider  any  corrections  for  texture  gradient.  It  simply  classified 
some  areas  of  a  scene  into  some  given  classes  of  textures. 

We  could  have  improved  the  feature  vector  using  further  features 
similar  to  Lendaris'  and  thus  enlarged  and  refined  the  classification 
procedure  of  texture.  We  did  not  do  it  for  the  following  reasons: 

(i)  Feature  vectors’  offer  very  specific  and  rigid  description  of 
a  texture,  which  is  an  obstacle  in  finding  continuity  of 
textured  regions  unless  the  texture  is  a  very  regular  pattern 
without  any  features  such  as  texture  gradient.  Naturally,  one 


tin  construct  feature  vector#  leaa  specific,  but  then  the 
•enaitivity  for  the  differences  between  two  different  textures 
wi il  be  lessened,  which  in  general  is  not  desirable.  To  sun 
up,  in  the  texture  region  finder  one  needs  to  have  a  flexi¬ 
bility  in  choosing  features  for  grouping  or  discriminatory 
purposes.  One  also  wants  to  have  symbolic  descriptions  with 

P-r“,et*r-  «  opposed  to  only  numeric  description  (as  in 
the  feature  vector).  The  sy^olic  description  (if  properly 
chosen)  1.  invariant  with  respect  to  several  netric  (scalar) 
features  and  thus  it  represents  a  certain  abstraction  which 
la  useful  for  recognition  purposes. 

(ii)  The  classification  process  of  textures  into  sone  classes  besides 
feature  vectors  uses  sone  distance  nea.urcnents  between  the 
training  feature  vector  and  the  sanple  feature  vector.  This 
process  does  not  consider  any  topological  properties  of  windows, 
nsnely  connectivity,  continuity  and  proxinity.  Furthermore, 
■ctric  description  of  .  real  texture  is  not  sufficient  for 
identification  purposes.  For  instance,  grass  is  identified 
«•  grass  not  only  because  of  its  color  or  the  geometry  of  its 
texture  but  also  through  its  spatial  relationship  with  other 
objects  on  the  scene  (e.g.  grsss  is  always  on  the  ground, 
below  s  sky,  etc.). 

*  different  approach  had  to  be  ,o„,ht  for  dce.rlbiu,  texture.;  an 
approach  that  veld  give  iybollc  de.crlptlon.  of  a  t„t„r,  together  ulth 

""  P*r*“Ctrrl  *nd  wl>“ld  fl"d  continuous  region,  ulth  reapect  to  their 
de.crlptlon*.  in  Chapter  J  ue  have  dcecrlbed  the  texture  operator  that 


I®  — 'tfU'' 


c 


produces  such  s  description.  This  operator  esn  function  on  different 
window  sixes.  The  large  vtndows  capture  the  global  textures,  whereas 
the  Malt  windows  are  used  for  recognition  of  fine  texture  that  In  the 
large  window  Is  not  noticed.  The  continuity  and  proxlsilty  of  sane  struc¬ 
tures  are  the  basic  properties  used  In  a  region  grower.  So  far,  we  talked  C 

■ostly  about  the  texture  structure.  However,  the  structure  that  fonts 

a  region  could  depend  on  suiny  properties,  such  as  color,  shape,  site, 
and  othera. 

Texture  Gradient. 

Many  elements  of  the  world  are  Bade  up  of  texture  elements  of  a 
constant  site,  (grass,  brick  walls,  wheat,  water  waves).  The  apparent 
site  of  texture  elements  depends  upon  distance.  Although  the*  .  Is  a 
chance  for  BlstaVe,  It  Is  natural  to  Interpret  consistent  variation  In 
apparent  alie  of  texture  elements  as  a  measure  of  relative  distance. 

If  there  la  little  variation,  the  interpretation  Is  that  the  surface  is 
everywhere  approximately  at  the  sane  distance  fro.  the  observer.  Such 
surfaces  are  marly  perpendicular  to  the  line  of  sight  and  are  called 
frmtal  surfaces.  If  there  Is  a  systematic  variation  of  apparent  site 
*f  texture  elements,  smaller  elements  are  assumed  further  away.  Such 
a  texture  gradient  suggests  that  the  surface  is  longitudinal,  that  is, 

•  lwg  the  line  of  sight.  The  present*  or  absence  of  a  systematic  tex¬ 
ture  gradient  gives  a  rough  Indication  of  the  aegle,  curvature,  and 
relative  distance  of  objccta.  The  role  of  texture  gradient  In  himan 
perception  of  depth  has  been  described  by  Cibson  (1950). 


102 


G 


© 


€ 


0 


O 


Ft*.  3* 

to  Figure  *t* ,  t  ur face  A*  la  a  longitudinal  surface  and  surface  1C 
4a  a  frontal  surface.  In  the  Image  thara  exists  a  (radiant  of  texture, 
fro*  coaraa  to  flna  along  ab,  wharaaa  In  the  Image,  no  such  gradlant 
oecura  along  be,  and  tha  texture  la  uni  font  throughout. 

tha  taxtura  gradlant  can  be  weed  aa  a  measuring  •lick  whoae  scat* 
wa  don’t  know,  but  which  gtvea  us  relative  depth  estimates: 
g  la  twtea  aa  far  aa  A. 

F  r  familiar  aurfaeaa  for  which  wa  know  tha  taxtura  ilnwnt  alia,  tha 
acala  of  tha  measuring  atlck  la  known,  and  wa  haw  an  catlMte  of  abi'luta 
dlatanca  (provided  we  have  an  estimate  of  tha  surface  angle  with  regard 
to  tha  observer  a  tcaga  plana  -  wa  shall  ah  mi  anon  that  wa  can  determine 
that  angle).  Since  tha  obaerver  knows  hla  orientation  with  regard  to 

lQh 


gravity,  by  assuming  a  level  ground  plane,  he  can  estimate  the  distance 
of  areas  near  his  feet  with  reasonable  accuracy.  This  helps  in  estab¬ 
lishing  absolute  sice  of  grass  and  other  textures  on  the  ground. 

There  is  one  reasonableness  condition  on  texture  gradients.  The 
apparent  sice  of  texture  element?  should  decrease  toward  the  horizon. 
That  is,  we  don't  expect  large  marly  level  overhangs,  above  *,s,  and 
for  opaque  surfaces  below  the  horizon,  we  must  see  decreasing  apparent 
element  size  toward  the  horizon. 

The  projection  of  a  longitudinal  or  slanted  surface  on  a  picture 
plane  is  obtained  by  perspective  geometry.  The  principles  governing 
such  a  projection  are  as  follows  (See  Fig.  J5). 


fd _ 

CO*  0. 


d»CO*  r>  ■ 

R, 


fd/co*  0, 


for  Mull  Cr  0^ 
CO*  0J  ~  CO*  0,  bT  1 


2 

V  *1 

P;  •»  the  similarity  of  two  triangle*  follow* 
fd*(  t«n  0O  -  un  0.)  fd  *cos  0^ 

- 2 - L  .  _ _ 2 

*  IL 


for  Mull  f»?  .  fl|  ),  R.  *  R?  ♦  HR 

dR  *  «*  t*nor*(  t*n»?  -  tan^j)  Rj,t*n  » 

Kow  let  ua  define  a  fractional  (Cradlrnt) 

C  .  Fractional  change  In  element  site  .las  me) 


Raaellne  In  laage 


6  •  6 

C  -  ,  ? _ . 

•/?U|  ♦  fty)  N  (tan  #?  -  tan  0j) 

wh|,rp  6 1 1 6^  texture  element  tlte*  in  the  Image. 

After  soar  appro* (a* t ion  we  obtain  formula: 

<  .  .  LULU' 

fd 

Rephrasing  the  formula  In  term*  of  angles  (on  retina)  instead  of  leegih 
on  retina,  we  get: 

€ 1  •  .  tan 

Thu*,  we  can  calculate  the  angle  with  respect  to  observer. 

Since  the  observer  know*  his  angles  with  respect  to  gravity  and  he 


1C* 


knows  the  angle  with  respect  to  the  observer,  he  thus  knows  the 
angle  of  the  surface  with  respect  to  gravity. 

Then,  the  texture  elesient  In  the  object  space  can  be  computed  as 
follows: 

R  *  6. 

4  m  1 

fd«cos  a 

How  sensitive  are  estlsutes  of  the  distance  to  the  asauaption  that 
the  ground  Is  level ? 

Consider  Plg.}6  . 


Fig.  & 

We  want  to  calculate  the  distance  (ns  observer  to  the  ground  for 
level  and  non-level  cases.  6  Is  the  angle  between  the  horlson  and  the 
hserver  view.  And  &  Is  the  angle  of  the  sianted  surface. 


Then  St  .  6 

s;  <y 

is  the  rstio  between  the  distant;  and  the  distance  Sg,  which  is 

the  distance  to  the  level  surface.  T*>e  formula  shows  that  there  is  a 
fairly  strong  dependence  on  <y,  except  for  small  distances. 

As  an  example  of  the  texture  gradient  snd  its  recognition,  we 
present  a  picture  of  the  ocean  (See  Ttg.  J5T  ) ;  without  recording  the 
texture  gradient  we  find  a  partition  of  the  picture  into  several  regions 
See  Fig.  58).  All  regions  are  described  as  aonodircctional  textured 
regions,  with  the  sane  directionality  but  with  different  wave  lengths. 
However  the  wave  length  changes  linearly  in  a  vertical  direction  across 
the  picture.  (From  the  bottoa  of  the  picture  the  wavelength  ■  to  the 
top  of  the  picture  where  the  wavelength  -  8).  Thus,  by  recognising  the 
texture  gradient,  we  recogniae  the  whole  picture  ss  one  textured  region, 
displayed  in  Fig.  39  . 


Ill 


The  World  Model 


5o. 

In  chapter  2  we  looked  closely  at  elements  of  an  outdoor  scene, 
involving  grass,  sky,  clouds,  water,  and  trees.  On  of  our  purposes 
was  to  introduce  the  sort  of  texture  descriptors  whch  we  have  imple¬ 
mented.  The  other  was  to  lead  the  way  into  a  discussion  of  our  world 
model.  We  saw  a  great  range  of  variation  for  sizes,  colors  and  other 
properties  of  texture  elements  in  these  outdoor  scenes.  Grass  ranges 
in  color  through  greens,  browns,  and  yellows.  Trees  range  from  a  few 
feet  to  a  few  hundred  feet  in  height.  Because  of  this  variation  and  the 
variation  of  apparent  size  of  objects  at  different  distances  from  the 
observer,  it  appears  that  no  immediate  identification  of  image  textures 
with  elements  of  the  world  is  reliable.  In  some  cases,  the  understandings 
depend  on  perhaps  unconscious  reasoning:  the  spray  on  rocks  is  not  very 
similar  in  appearance  to  the  ocean  around  it.  In  many  cases,  the  identi¬ 
fications  are  simply  resolved  by  considering  relations  between  image 
regions;  motion  obscuration  identifies  trees  in  front  of  clouds,  shadows 
identify  trees  as  standing  above  ground,  obscuration  implies  a  background. 
Relative  depth  determines  that  the  ground  is  roughly  level  and  that  trees 
stand  above  the  ground . 

It  is  reasonable  to  question  whether  a  model  which  must  allow  as 
much  flexibility  as  to  allow  the  range  of  sizes  for  objects,  and  the 
variation  in  relations,  is  of  any  use  at  all.  There  are  several  ways 
in  which  it  is  useful.  The  first  is  that  certain  relations  are  reasonably 
stable.  The  sky  is  bright  against  the  horizon.  The  proportions  of  grass 
and  trees  are  roughly  independent  of  size.  Certain  regular  shapes  are 
usually  man-made.  The  second  is  that  much  of  the  variation  is  connected 


112 


with  subsidiary  conditions.  If  trees  appear  different  colors,  they  are 
different  species,  and  have  other  identifiable  properties.  If  the  grass 
is  yellow,  then  it  must  be  dry.  An  apparent  size  gradient  probably  means 
a  distance  gradient. 

However,  the  uuual  mode  of  pei  ception  is  continuous  perception. 

In  scene  analysis,  we  often  think  of  showing  a  single  picture  with  no 
context  and  expect  the  observer  to  under stand  it.  Indeed,  humans  can 
do  just  that  usually.  But  the  bulk  of  perceptual  activity  is  involved 
in  moving  in  a  world  in  which  changes  happen  slowly  and  locally.  Most 
of  the  world  is  nearly  unchanged  from  one  moment  to  the  next.  Most  of 
the  recent  perceptual  understanding  are  useful  at  any  instant;  the  system 
knows  a  great  deal  about  the  environment  and  makes  incremental  changes 
to  its  model.  The  making  of  the  changes  to  the  model  is  aided  by  the 
detail  of  the  knowledge  already  available. 

That  does  not  mean  that  we  can  do  without  the  ability  to  actually 
build  up  the  monel,  either  from  the  picture  shown  out  of  context,  or 
guided  by  an  already  detailed  model.  But  it  does  mean  that  a  large 
part  of  perceptual  activity  is  guided  by  detailed  models. 

Another  aspect  of  the  world  model  is  that  it  contains  the  information 
about  the  observer's  point  of  view.  The  observer's  notion  provides  a 
depth  sense  equivalent  to  stereo,  but  much  more  useful  for  distant  ob¬ 
jects.  Distance  estimates  using  motion  parallax  depend  on  the  observer's 
estimate  of  his  motion.  Stereo  distance  measurements  depend  upon  a  model 
for  the  convergence  position  of  the  two  eyes,  the  eye  separation,  and 
correspondence  of  the  coordinate  systems  of  the  two  eyes.  An  equivalent 
observer  model  has  been  implemented  at  this  laboratory  in  the  work  of 


Sobel  (1970),  Tenenbaum  (1970),  and  the  use  of  observer's  motion  for  depth 
perception  has  been  implemented  by  Nevatia  (nnpnblished) .  Formally,  the 
model  has  two  levels: 

(a)  Regions  in  Che  of.jp:  t  space,  objects  and  collections  of 

objects,  called  the  eleptonts  of  the  model; 

»  * 

(b)  Structured  description  of  the  elements  in  the  object  space. 

These  descriptions  are  almost  directly  interpretable  in  a 

1  program  as  procedures. 

A  world  model  Is  a  dynamic  structure  that  changes  during  the 
Identification  process.  The  description  of  the  elements  of  the  model 
is  carried  out  In  the  object  space  and,  wherever  it  is  possible,  with 
counterparts  in  the  image  space.  Not  all  descriptors  in  the  object 
space  have  a  meaningful  counterpart  in  image  space.  An  example  is  the 

size  of  objects,  which  can  be  interpreted  from  distance  estimates  and 
apparent  sizes. 

The  properties  of  grass,  sky,  water  and  trees  have  been  described 
in  Table  1.  All  these  descriptions  are  included  in  the  world  model  and 
some  new  ones  are  included  in  Table  7. 

All  objects  in  the  model,  except  rocks  and  unnamed  objects,  have 
broken  boundaries. 

One  may  wonder  what  other  descriptors  (besides  texture  descriptors 
and  color  descriptors ) .could  be  relevant  in  the  model.  Unlike  in  the 
case  of  grass  blades,  watir- waves,  and  trees,  where  their  size  plays 
an  important  role,  the  size  of  rocks  varies  so  much  that  it  is  hardly 
a  useful  feature  for  them.  On  the  other  hand,  the  shape  of  rocks 
bloblike),  is  signf leant  because  it  can  be  contrasted  with  the  linear 


shapes  of  grass  leaves  or  water  waves.  However,  the  only  rocks  of 
interest  are  those  which  are  big  enough  to  stick  far  out  of  the  ground 
(might  impede  navigation).  Here  we  worry  about  the  relevance  of  size 
and  shape  of  texture  elements.  What  about  the  size  and  shape  of  regions? 
Size  and  shape  is  not  significant  for  regions  of  grass,  ocean,  forest, 
and  ensembles  of  rocks. 


Color 

Attributes 


{  Spatial 
)  Relaclonihlpa 


Usually  green, 
sometimes  yellow 
or  light  brown, 
never  blue. 


Blue  or  green, 
sonetiaci  gray 
with  silver  waves, 
never  red 


Light  blue,  the 
brightest  area 
in  the  scene. 


Objects  in  the  sky. 


The  crown  is 
usually  green, 
sometimes  yellow, 
brown  or  red  . 

The  trunk  if  dark 
brown. 


Located  on  ground,  under 
the  sky  and  trees. 


Located  at  the  ground  plane 
below  sky  and  trees.  In  the 
lsMge  space,  ocean  and  grass, 
trees  or  rocks  could  fora  uver- 
lapping  regions. 


Sky  is  the  farthest  region  in 
the  scene  and  it  is  always 
above  any  element  of  the  world 
model.  In  the  iawge  space  it 
can  fora  overlapping  regions  with 
grass,  trees,  and  rocks. 


Trees  arc  below  the  sky,  at.d 
above  grass  or  ocean. 


All  shades  of  gray,  Rocks  are  always  below  sky  and 
brown  or  red.  on  the  ground.  They  could  be 

scattered  in  grass  and  water. 


any  color 


On  ground,  below  sky. 


1 


5 .*»  The  Higher  IjwI  Progra* 

We  ditcus*  briefly  e  suggested  blither  level  prog ran  which  we 
•  lesulalc.  Wc  concentrate  cvi  the  two  •erne*  In  figure*  \  end  V?.  rig.  4© 
contain*  three  picture*  l*f*n  of  the  *cene  In  rig.  I  through  J  filter*, 
red,  green  end  blue.  The  na*w*  9CKHKI,  SOtJfT,  end  SCK.'fT'  correspond  to 
red,  green  end  blue  filtered  scene*,  figure  conteln*  fowl  picture* 
of  e  acene  In  Tig.  .1.  The  top  two  end  the  button  left  picture*  correspond 
to  the  rnl,  green,  end  blue  filtered  picture*,  respectively.  The  boito** 
right  picture  I*  the  brightness  function  of  the  scene  in  llg.  I. 


n 


rfg.  ho 


nr 


Ul  o.  Mil  ,h.  r„,  4(tJ  „^K(|wly 

Ihr«  ...  the  «f«.  «,  |W.  |M,r„,„B(  lhe  K(K(W, 

•f  the  «ctn«i,  thcte  night  be  called* 

organ! ration  of  regiona  Having  contlnuoua  grogertlea 

detemination  of  agetial  relations 

idem!  fleet Ion  of  elanenta  In  object  apace; 

THty  are  not  atrlctiy  helrerchlea I ,  .Inca  Ident Iflcatlon  determine.  new 
agatlel  relations,  and  suggests  other  low  late!  orgenltetlona. 

Son*  of  the  nechanlana  for  orgenlxatlon  of  continuous  region,  were 
gr.vlou.ly  dlacuaaed.  The  region,  baaed  on  continuity  In  color  and  .one 
feature  de.crlgtor.  are  natural  .tartlng  glacea.  Proximo  grovlde.  the 
ba.I.  for  the  auggeatlon  of  texture  auger-region*  which  are  dl.connected, 
hot  nay  ho  uaefully  conaldered  a.  a  unit,  m  thl.  oger.tl.n,  we  growg 
together  nearby  region.,  of  like  color  or  like  texiur.!  grogertlea.  The 
neat  nechani*.  la  that  of  hygotHe.t.*vert fiction:  «  gartlcular  color  or 
texture  region  Is  a  hygotheal.  of  continuity.  If  the  region  ha. 
ghyalca*  continuity,  we  ahould  find  that  other  gr.>gertle.  are  ccntlnuoua 
over  the  region.  If  the  komdarlea  are  falae,  then  there  ahould  he 
textural  grogertlea  which  cottlnue  acroaa  the  boundary,  free*  which  vt> 
would  «Hvnt  a  continuity  which  would  he  tested  by  looting  lor  other 
continuity.  If  the  b  under  lea  correagond  to  ghyaical  boundarle.,  we  wilt 
u.ually  he  able  to  find  a  discontinuity  m  aow  textural  grogerty. 

Me  can  infer  a  few  facial  relation,  fron  the  uxture  gradient, 
ifm  gue..ea  about  Intergoaltlon  (which  object  la  In  front  of  which*  and 
fro.  the  observer  orientation  and  gcalilon,  coined  with  the  ground  gUne 
hygotheal..  In  a  conflate  eyaien,  ve  could  call  on  degth  gerctgtlen  by 

I£0 


o 


t 


s 


% 


» 


» 


stereo  and  motion  parallax.  Our  Inferences  would  be  a  good  guide  to 
economical  use  of  these  modules. 

Identification,  In  our  suggested  system,  proceeds  both  from  the 
world  model  and  from  the  data.  Some  elements  of  the  worlc  model  are 
better  starting  places  than  others.  We  assume  that  the  sky  would  be 
easily  established  in  most  cases.  Other  image  elements  jhould  be 
approached  a 'ter  finding  the  important  structural  elements  in  object 
space,  i.e.,  sky,  ground  plane,  and  trees.  We  are  assuming  that  a  full 
variety  of  properties  and  relations  aid  us  in  amking  initial  and  tentative 
identifications  of  sky,  etc. 

In  Figure  bo,  which  we  call  WATER,  we  have  two  major  regions  which 
correspond  to  grass,  a  region  which  corresponds  to  the  rock,  and  two 
regions  which  correspond  to  the  water.  In  the  scene  called  ROSES,  the 
sky  appears  as  one  large  region  and  several  small  regions;  there  are  three 
regions  of  the  bush,  and  several  snsll  regions  which  correspond  to  roses. 

Tn  the  analysis  of  these  two  scenes  based  only  on  texture  analysis 
without  guidance,  the  scene  WATER  is  described  psrtly  adequately.  After  the 
texture  gradient  suggests  further  continuities,  the  grassy  regions  merge, 
and  there  remain  three  main  image  elements  which  correspond  to  grass, 
rock,  and  water;  see  Fig.  k}. 

This  texture  organisation  is  suitable  for  providing  hypotheses  of 
continuity  for  regions  which  are  broken  by  color  organisation  for  WATER; 
see  Hg.  bb .  The  slmllarii./  ir.  color  of  the  Joined  color  regions  confirms 
the  texture  continuity.  In  ROSES,  the  sky  is  adequately  described  by 
texture,  while  the  bush  and  flower  regions  are  chaotic,  as  one  can  see  in 
Fig.  bf .  This  reflects  one  of  the  inadequacies  of  the  Fourier  transform, 


122 


Reproduced  Irom 
U*i  tvilebU  copy 


the  weakness  with  feature  sizes  approaching  the  window  size.  This  is 
normally  accomplished  by  subdividing  regions  with  slow  changes  which 
correspond  to  probable  region  boundaries.  That  was  suppressed  in  this 
version  of  the  program. 

The  region-growing  does  not  succeed  in  isolating  the  flowers  by 
cutting  up  windows  containing  flowers  to  partition  off  smaller  cells  of 
adjoining  areas  of  the  bush.  This  is  the  worst  performance  of  the 
texture  region  finding  process,  but  it  is  instructive.  On  the  whole,  the 
unaugmented  texture  region  analysis  is  unable  to  aid  in  proposing  useful 
alternative  hypotheses  for  organization.  However,  texture  boundaries  for 
the  sky  coincide  with  color  boundaries  (the  color  boundaries  of  the  roses 
are  displayed  in  Fig.  46),  and  a  slight  relaxation  of  the  criteria  for 
continuity,  verified  by  continuity  in  contrast  among  the  color  components, 
does  provide  a  set  of  larger  texture  regions  among  the  bush  and  flowers. 
Even  in  that  worst  case,  the  jumbled  areas  of  color  correspond  to  regions 
of  moderate  size  under  texture,  so  that  there  are  no  large  regions  of 
the  picture  which  appear  entirely  chaotic  under  both  aspects.  The  texture 
descriptors  are  useful  for  analyzing  the  color  regions,  and  have  more 
utility  used  in  that  directed  mode.  The  element  size  and  contrast  are 
meaningful  when  restricted  to  the  bush;  in  the  unaided  texture  analysis, 
these  descriptors  mix  the  flowers  and  bush. 

In  evaluating  our  higher  level  procedures,  it  is  usual  that  we  re¬ 
evaluate  the  quality  of  the  lower  level  modules.  We  find  significant  ways 
in  which  they  could  be  improved,  and  in  ways  which  would  best  be  done  at 
that  level.  In  general,  it  is  better  to  proceed  to  a  fully  developed 
system  then  to  put  disproportionate  work  at  the  low  level.  We  will  later 


specify  what  improvements  we  would  make  in  the  low  level  modules. 

Let  us  make  the  preliminary  organization  of  the  two  scenes.  With 
ROSES,  we  begin  with  proximity  of  color  regions.  The  bush  regions  and 
the  flower  regions  are  alike  in  color;  for  example  in  two  areas  of  the 
bush,  the  color  coordinates  r/(r+g+b)  and  g/(r+g+b)  are: 

Sample  1  (.46,  .43) 

Sample  2  ( .47  ,  .40) 

thus  we  can  conjecture  these  as  a  super-region.  Let  us  compare  contrast 
and  dominant  wavelength  for  these  two  color  regions  which  we  conjecture 
to  be  similar.  Compare  the  two  color  regions  in  Fig.  46  with  the  Tables 
8,  9,  and  10  of  average  intensity,  wavelength,  and  contrast,  over  8x8 
windows.  We  see  that  the  dominant  wavelength  is  short  over  much  of  these 
two  color  regions.  In  fact,  if  we  define  a  region  from  the  small 
wavelengths  (<  4)  the  region  spreads  over  most  of  the  bush.  In  the  scene, 
the  sky  is  a  region  under  color  and  all  texture  descriptors.  The  sky 
boundary  in  color  is  reinforced  by  the  existence  of  texture  boundaries. 

As  we  have  indicated,  textural  properties  are  probably  adequate  to  confirm 
continuity  of  the  regions  suggested  by  color  for  bushes  and  flowers,  and 
to  show  discontinuities  of  frequency.  In  WATER,  the  tv/o  regions  correspond¬ 
ing  to  water  are  joined  by  proximity  in  color  and  continuity  in  texture. 

The  water  boundary  shows  up  strongly  as  a  change  in  color  and  in  texture, 
directional  to  homogeneous  for  the  water-rock  boundary,  and  different 
directionalities  with  distinctly  different  color  at  the  water-grass 
boundary.  The  grass  is  continous  in  directionality,  size,  and  color. 

We  can  now  make  correspondence  with  the  world  model.  Since  the  sky  is 
often  prominent  in  outdoor  scenes,  we  attempt  to  find  the  sky.  We  look 


125 


at  white  and  blue  regions  which  are  near  or  above  the  horizon.  In 
WATER,  we  might  try  the  region  which  is  really  water.  The  color  is 
acceptable,  but  the  directionality  is  very  unlikely  for  sky,  and  the 
contrast  and  size  of  texture  elements  is  also  unlikely.  (This  estimate 
is  based  on  a  few  months  of  sporadic  sky  watching.  (Of  course,  there 
are  directional  clouds,  'Vnackerel  sky",  but  it  seems  quite  infrequent. 
Also,  the  clouds  seem  to  have  much  lower  frequency.)  The  water  region  is 
below  the  horizon.  If  there  were  a  significant  view,  we  could  see  a 
texture  gradient  and  thus  substantiate  that  the  surface  ia  flat.  Also, 
in  continuous  perception,  we  would  find  that  the  water  motion  is  very 
different  from  cloud  motion.  Motion  would  also  allow  interpretation  of 
the  breakers  around  the  rock  as  part  of  the  water.  The  region  correspond¬ 
ing  to  grass  is  directional,  low  contrast,  and  has  a  texture  gradient, 
implying  that  it  is  horizontal.  The  color  is  consistent  with  grass, 
which  lies  on  the  ground  plane.  From  the  ground  plane  assumption,  we 
can  estimate  the  size  of  the  elements  of  the  grass: 
image  size*angular  resolution*distance 
=  2*(l/666/*500  cm  =  .9cm 

where  we  estimated  the  image  size  previously,  the  camera  parameters  are 
known,  and  the  distance  is  obtained  from  a  crude  guess,  but  is  known  in 
principle  from  the  observer  position  and  orientation.  The  size  is  also 
consistent  with  grass  blades.  For  the  rock,  neither  color  nor  homogeneity 
tell  us  very  much.  Since  the  rock  is  convex  downward  on  its  boundary  with 
water,  we  assume  that  the  rock  is  in  front  of  the  horizontal  water  surface. 
We  assume  thus  that  it  is  an  object  which  sticks  up  from  the  surface, 
and  calculate  the  vertical  height  and  length  along  the  ground. 


From  the 


image,  the  texture  gradient  tells  us  that  the  distance  at  the  rock  is  about 
4  times  that  at  the  front  of  the  picture.  Thus,  the  above  expression 
gives : 

18*(  1/666 )  *1200cm  =  36cm 

while  the  width  of  the  rock  is  approximately  300cm.  These  are  only- 
approximate  values  which  depend  on  our  guesses  about  the  ground  plane 
and  texture  gradient.  On  the  other  hand,  the  conclusions  depend  most 
strongly  on  relative  size  conclusions.  Grass  elements  are  small;  rocks 
are  often  big  compared  to  grass.  We  can  make  the  comparison  between 
the  rock  and  grass  near  the  base  of  the  rock.  In  the  image,  the  rock  is 
big,  and  from  all  assumptions  about  objects  in  the  image  being  further 
away  as  they  recede  in  apparent  position  toward  the  horizon,  the  rock  is 
much  bigger  than  the  elements  of  the  grass.  These  give  some  strength  to 
the  assumption. 

In  ROSES,  we  begin  by  attempting  to  find  the  sky.  The  only  region 
of  acceptable  color  is  the  sky  itself.  The  color  is  white,  indicating 
clouds,  with  low  contrast  as  seen  in  Table  8.  The  texture  is  homogeneous. 
As  a  verification,  we  might  find  blue  patches,  find  motion,  and  find  that 
the  distance  of  this  region  is  very  great.  The  region  is  far  above  the 
horizon,  and  is  very  bright;  see  the  brightness  in  Table  6.  From  the 
concave  downward  boundary  with  the  other  regions,  we  assume  that  it  is 
behind  the  green  elements.  With  the  identification  sky;  we  find  that 
the  green  elements  are  in  front  of  the  sky,  thus  probably  approximately 
vertical  and  are  frontal  (they  show  no  systematic  texture  gradient  ,  also 
indicating  that  they  are  vertical).  The  texture  is  blob-like;  the  blob 
size  is  interesting.  If  we  can  guess  that  these  are  leaves  rather  than 


leaf  ^lusters  or  branches,  then  we  can  estimate  the  distance  to  the  bush. 
Finding  Liu*  sLems  would  aid  in  that,  because  the  bush  is  probably 
vertical,  it  is  not  grass.  If  we  include  leaf  elements,  fruits  and 
flowers  in  our  descriptions  of  trees  and  bushes,  then  by  guessing  that 
the  flowers  are  really  associated  with  the  bush  which  surrounds  them,  we 
can  guess  the  scale  of  the  leaves  relative  to  the  flowers,  and  thus 
establish  that  tlic  texture  elements  are  leaves  and  establish  approximate 
distance.  Of  course,  at  any  level,  we  could  establish  relatively 
unique  elements  to  correspond  in  two  views,  and  determine  distance  by 


stereo  or  motion. 


6.  CONCLUSIONS 


We  presented  a  representation  of  textured  scenes  which  was  not  a  two- 
dimensional  representation  of  the  projected  image,  but  a  three-dimensional 
representation  of  the  elements  and  spatial  relations  in  object  space.  We 
feel  that  the  representation  of  spatial  relation  such  as  'ferass  is  found  on 
the  ground  plane"and  the  nearly  infinite  distance  of  the  sky  are  charact¬ 
eristic  of  these  elements,  and  help  more  than  any  other  properties  to 
identify  them  and  to  orient  the  observer.  Our  representation  is  effective, 
also,  in  that  it  is  segmented  into  distinct  elements,  which  are  described 
by  a  heirarchy  of  texture  regions  and  textured  elements.  Textured  regions 
may  be  texture  elements  of  a  super  texture,  or  texture  elements  may  be 
textured  regions  of  a  sub-texture.  This  is  not  only  a  formal  nicety,  but 
a  usual  part  of  our  description  of  ourdoor  scenes;  for  example,  in  trees, 
the  leaves  are  texture  elements  of  leaf  clusters  or  branches,  which  are 
texture  elements  of  a  tree,  which  is  a  texture  element  of  tree  clusters. 

The  description  of  shape  of  texture  elements  depends  heavily  on  a  linear 
approximation  to  shape,  and  describes  directionality,  width  of  texture 
elements,  and  spacings.  We  argue  from  psychological  evidence  that  these 
are  the  most  important  of  descriptors,  and  further,  that  they  are  natural 
for  computer  implementation.  These  descriptors  are  most  useful  for 
directional  textures.  The  representation  is  included  in  a  world  model  for 
which  an  example  was  given,  but  whicl  awaits  implementation  with  the 
high  level  procedures. 

A  simple  color  region  analysis  was  very  useful  in  texture  analysis. 

The  normalized  color  coordinates,  r/(r+g+b)  and  g/(r+g+b)  were 
compared  for  continuity,  and  regions  were  defined  by  neighbors  of 
continuous  color.  There  are  some  potential  problems  in  an  analysis 


129 


ot  color  In  outdoor  scenes,  since  many  of  the  texture  elements  ere  very 
amnll.  Typically,  the  regions  are  leaves  or  blades  of  grass.  As  an 
expedient  to  find  larger  regions,  we  have  averaged  over  a  small  window, 
and  compared  colors  of  adjacent  windows.  As  a  consequence,  we  sacrificed 
localisation  of  edges.  The  expedient  is  only  partially  successful  however. 
We  notice  that  the  averaging  works  best  among  clusters  of  leaves  where  the 
color  is  uniform  to  begin  with.  Where  the  leaves  are  Isolated  against  tlw 
sky,  the  color  contrast  is  large,  and  continuity  of  the  averages  arc  only 
by  chance.  Thus,  the  averaging  is  not  very  successful.  A  better  mechanism 
to  define  larger  regions  of  color  is  perhaps  to  go  to  the  computations  1 ly 
more  difficult  operation  of  finding  like  colors  within  windows,  that  is 
to  implement  color  regions  based  on  proximity  rather  than  continuity. 

The  obvious  velue  of  defining  color  regions  which  Ignore  the  brightness 
fluctuations  of  Individual  leaves  should  not  lead  us  to  ignore  brightness 
edges  and  consider  only  color  edges.  We  also  make  use  of  the  size  of  the 
brightness  regions,  the  leaves  in  this  case. 

A  sheaf-theoretic  description  formalizes  the  process  of  region-growing 
and  gives  an  exact  account  of  the  shift  from  local  to  global  and  vice 
versa.  One  must  be  cautious  about  interpreting  the  shea f-theoret ic 
notions  in  the  context  of  color  and  texture  regions.  Due  to  sampling  we 
have  a  finite  scale  of  window  sizes  and  tire  definition  must  Include  a 
least  window  size,  tire  size  of  texture  elements.  No  such  discreteness 
conditions  arc  embedded  in  sheaf  theory. 

In  the  implementation  of  texture  descriptors,  we  were  able  to  translate 
those  spatial  domain  descriptors  that  we  found  important  from  Fourier 
transforms  over  windows  of  various  sizes.  Directional  and  non-direct Iona  1 


c  deponent  *  were  separable  to  «  useful  extent.  These  reflected  shapes 
of  lonur,.  rlmu  and  U.e.r  spatial  relatluna,  or  eUUailF*|  propcrly 
of  Irregular  textures.  We  argued  Hut  the  human  Ueicr.pt Ion  of  «n  yg* 
*****  ute  of  4  tup  function  *ppr,«lmailon;  we  often  describe  in  terms  of 
region,  of  constant  Inten.lty.  We  obtain  analytic  expressions  for  the 
contra.i,  the  eleeent  site  and  apacing,  end  aoee  location  Information; 
theta  were  bated  on  e  tpetiel  domain  eodel  of  pulte.  ol  equal  aepliiude, 
width  end  spacing,  on  a  uni  for*  background  level.  The  inpleeentetlon  of 
Pourier  transform  de.criptora  had  t*»e  minor  Implementation  dl t’f icultie. 
which  are  u.uelly  overlooked.  They  .re  consequence*  of  the  feci  that 
the  Pest  Pourier  Transform  It  really  e  Pourier  terles  and  not  e  Pourier 
transform.  A  Pourier  transform  hat  no  preferred  directions,  while  the 
Pourier  terlet  hat  preferred  e»*t  along  the  x  end  y  coordinate  .->***. 
Thi.  Introduce,  e  non-isotropy  In  the  fen  filter.  Uc  have  uted  only  4 
straight  forward  fen  filter  end  have  not  attempted  to  conpentate  for  the 
peculiarity  of  the  Pourier  terlet.  There  are  alto  tpurlout  broadening 
ol  peak,  which  ere  consequences  of  the  Pourier  terlet.  Despite  thete 
difficulty,  the  spectra  thow  useful  directions! Ity  properties,  and  we 
have  been  able  to  work  with  then. 

the  rare  serlout  probU,  with  the  Pourier  transform  are  conceptual 
difficulties  which  sre  true  of  any  orthogonal  expansion  and  of  the  true 
Pourier  transfer*.  Interpretation  Is  bated  on  the  pouer  spectrum  and 
phase  information  It  Ignored.  These  transforms  are  non-local,  and  give 
very  poor  edge  and  position  Information.  To  an  extent,  we  have  tried  to 
get  around  this  by  an  expedient  of  using  local  windows.  This  provides  a 
crude  local Icatlon,  which  was  not  adequate  for  m 

131 


purposes.  The  usefulness 


ol  II-  ir.n.lom.  «,  „,r  d,„nd,nl  lK,  #,  (|>.  Tho 

draerlptorc  vero  u,,,„l  lh4  „„  ^  |# 

0",',  . . *  •<  .«.««  ullh,„  ,bo 

Th„  Ih„  4  „„„  v|„„  . ,h„ 

V.  COdld  hot  ha„  W»doo,  «„  .0,0*0  „  |4„,  ewu(h  4  Mf|<in 

..I*™,  w  ecu  Id  |.rob.bl„  Inlon.,,,., 

,h,  0(  tan||  . . . 

lo,  „»c.d  ,„  m,,,  u.clul  ,h. 

I.  du.ic  l,,o*.U,.  the  ooo-loc.l  M(ur,  ,|  IN,  . . 

. . «  “,l"'  *.  •"  o,h„  v.  can  «... 

c^lc,  oh,,.  ,hc  a„,a,cd  4t4  WIjl  „ 

fcdelnc  local  l»u«d.„„.  Thi.  ha,  l,.,,.d  o,,|,,r.  ,w„  4r,  ,  Itv 

.I.Hc  C..„  which  Ik4  „„  w  lfim  Ihr 

Imrlc,  ol  I»i,.,„„,  lhtt>U(,h  lhf„  „„  ^ 

J>ci  l»c.r,o,ai,d  lha„  in.  w  d.»c,|„»„. 

,l*"’  •"  M*w  l-»r.w-,oi,  <o  t—  «.dc  „  r„„„ 

1cK„P.o,  .Cl - Oh.  ,b.„  ,o  „„  lBIml  4Mml.I1N( 

’*  '"**  •»  dl.cooiloolll,,  lo  d  licet  I  o'M  I  Ir.ioir* .  |o  ,w 

co«-..  over  laired  vlodm.o,  woold  *  „Jr  bmnda,,,,. 

«*  ".Ollowd  above.  U,  ,b»u|d  ,„ctw|r  moo  aod 
'**"*  <*>"-*•  oo  tlllontf  dl,.,t|„4|  cboncoi.)  ,,„d  .be 

*M  '•  ko«>d.f .1  ,«,|.!»*,m„oo. 

rrKimii. 

I.xiuicd  ,,,,00.  „«  obtained  o.  lb.  h.„.  ,b,  d,.c„„o„. 

“*  Mad  Iha,  4  ,a„w  olndrn  w,  „r,t^  M| 

adcoai,  ,.  ,b,  a.  „  Mpb,  ba„.  ,bo  choice  o, 

IF 


o 


window  tuei  would  bo  Niter  led  to  higher  level  choice  enwng  iu««eiud 
teg  on*.  IN  ectuel  gregren  winch  geve  textured  regia**  rle*«lft*d  IN 
deicrlgtor*  according  to  nenoMlirect  tone  I ,  bi*dlrecti  >nel ,  hlob-llie, 
hwwgeneeus,  *nd  noisy.  TNn  adjacent  reiitnt  were  merged  Into  continuous 
region*  Need  on  continuity  of  tN  descriptor*.  A  eh*®#*  of  scale  wee  uaed 
If  tNre  wee  a  strong  frequency  I  coop*  oent  (1N1  U,  iNre  gruNbly  wee 
en  edge  In  the  window  or  If  there  we*  «  bi«dlrcct lunel  texture  which 
cmld  erlee  from  e  bindery  of  two  dlrrctlenel  texture*.  A  second  region* 

K  owing  gets  relax**  the  crlterle  end  Ignore*  cUmI  fleet  Ion*  to  extend 
Urge  region*  by  Including  eeccgteble  .**11  neighboring  region*  under 
re U Red  crlterU. 

A  guided  grogren  determined  textured  region*  Nerd  on  Inforewtlon 
g***ed  down  Iron  eNve.  Thl*  pregrew  wee  guided  by  tN  ueer,  but  the 
edvlce  could  cruelly  well  Nve  com  free*  other  grogren*.  TN  guided 
gregrene  determined  texture  reglene  within  specified  eree*  end  record  Inc 
to  egecl fled  elnllerlty  crlterle.  for  esewgU,  frequency  end  coot  reel. 
Continuity  could  N  exglcred  on  the  Net*  of  separate  parameter*. 

We  see  eemw  tncreerntel  Inrsovement*  to  tN  feglon*growlng  N»ed 
on  Fourier  deecrlgtor*  w*  would  edd  en  alternate  tectlc  for  windo**  with 
lortg  we ve length* ;  now  we  subdivide  Nfed  on  the  psslblllty  of  en  edge  In 
tN  window.  We  would  eUo  Increase  window  *Un  to  loci  for  e  regee  ted 
feeture  of  e  Urger  »lce.  We  would  eliminate  tN  cleat! f Icetlon  iteg  now 
used.  TNre  1*  tone  irgumnt  In  fever  of  tN  clesslf  Ice  1 1  on.  TNre  ete 
ectuslly  many  neturel  object*  which  fell  Into  one  or  another  of  tN*« 
clesse*.  but  tN  clesslflcetlon  In  the  eerly  siege*  Introduces  eritftctel 
bounder les  end  Ignore*  the  nvltfgle  detcrlgtor*  which  ere  available.  Only 


<>*»  . .  *  .04  fro  ,|WI|U„ 

4MUIM  vo.14  A.  on  if  Imol.ilM  of  „a,i»u,  „,  m.  4o,crl,.i»,.  „ 
vlnifw*  fur  trstur?  •  uf*f  >re£ic»n*. 

«r  Arc.nl.4  „ii,f  ,or  .  huh,t  ,tw|  "lc  t 

corrrfnlrncp  l-lwon  ,h.  loo*.  ,Ufn„  .M  „rl4  m, 

“*  •«<  in  iv»  .«f  |... 

,h.  Jt  ,hr  (kf„. 

....cl., .  „  „„  f,„M(r  I^,l|((„(1„> 

«<•»  «'>  4am  I.  Inl.rprcl.tif  .1 

I.,  -huh  ,ho  vo,|4  .lfMI,„„  «  „  If.,,,  awIJM<  ,f  ^ 

CAn.l4.,„,o»,  ...  flAfri.n, .  A  ffcflv.1  ,n,m  *eln,  l0  h,n 

"'I 11  «»  -Xli  -0-  iKf  .  .If  la  „a, 

4m, .  v„h  „o*.„lo.  o,  .  h, f, 

•«  fio^.lf  A„M.,I„.  ..I  melon. .  CfllA.li, 

**"  f  „  ,h.  to„,  Iht  0|,„  lh„ 

r»tlf  4>  .crfilM  Irf  Of  nofri,  .III  m  frilcoUfl,  tiff,  ,„j 

omf.l ,  I,  |.  f.  tiff  Ik.1  «  „,k  ,Ao.fo  oht.h  „r 

II  I.:  -lit  ..llltl.A,  COAI.M  <f  14  »V,  ,  ,of  tko,„.  h, 

vr  C.A  oi f  form,  ctolm,  ..  hrf.hr.o, 

:h*'  ">  "»■  Kfllf  COTr.fnf. 

All-r  Uforilo.s  Al.cfim.it,  In  f4.„lel  01, OA  ,.ff«„  o„. 

c  If  II  Ir.  I.  mvor.l  prof  rile.  |.  ,he.  off,.,.  to,o,  w„ 

1010.4  Imo  color  ,f,r-r.*lf,  ol  III. 

'fln.nl  fresorne,  «rd  cflrei  *« 
ll.  color  .fc,.„eIf .  a,  Ihl.  fin,.  ,h. 

If  cl  If  in  ,  t.14.4  ..Jo  lh,  „s|<m  .orf^ofto,  1|w(< 


(ypJc*t  descriptors  *hd  Include  nearby  areas  of  the  Image  which  wara 
not  wall -described  In  the  earlier  analysis  with  little  context. 

The  regions  which  cone  forth  do  not  make  a  neat  Usage.  They  overlap 
end  do  not  cover  the  whole  Usige.  Still,  we  are  elolng  to  Interpret  those 
*ms  of  It  that  are  alnple  to  understand.  Inferences  of  spatial  relations 
are  Important  here.  Texture  gradient  gave  est (nates  of  surface  orientation. 
The  grourd  plane  assumption  gave  a  local  coordinate  system  when  coebined 
with  tb&  assumption  that  objects  stand  out  from  the  ground.  These 
relations  depended  primarily  on  relative  distance  and  relative  site 
estimates  which  were  not  greatly  atnaltivt  to  the  assumptions.  In  some 
case*  It  was  possible  to  guess  which  object  wae  In  Iron-  of  another, 
either  because  of  concavity,  or  from  identification  of  the  sky  or  water. 

In  Identifying  elements  and  structure  of  the  world  model,  our 
simulation  attempted  first  to  establish  th-t  sky.  gased  on  color 
brightness,  contrast  with  the  horiaon,  end  Its  position  (above  the 
horUon  estimated  by  gravity),  this  Is  assumed  a  sl^lc  match.  Ue  can  also 
then  determine  the  sky  line  end  guest  which  objects  stand  out  from  the 
ground  plane.  Sites  of  texture  elements  were  of  considerable  use;  knowledge 
»f  si#c  Is  much  more  useful  here  than  In  the  blocks  world  where  context 
»*  limited.  The  knowledge  of  the  sice  of  grass  blades  Is  tore  useful 
than  the  knowledge  of  the  sice  of  one  particular  block. 

We  have  mentioned  some  incremental  Improvement •  to  texture  local 
description  and  to  forming  texture  regtiena.  The  primary  veaknesaes  at 
this  level  are  the  crude  localisation  and  limited  use  of  proximity  In 
establishing  super-reglen*.  Improvement  of  the  color  reglcn-f Inder  Is  .»f 
primary  importance.  Since  must  of  the  Interpretation  depended  upon 


in  Irenes  separate  from  strictly  textural  properties  of  areas  of  the 
Imrc,  we  feel  that  the  sw>st  significant  next  stage  Is  to  embed  these 
element*  In  a  complete  visual  system.  This  would  Involve  more  than  Just 
Implementation  of  the  simulated  higher  level  program.  The  typical  system 
would  navigate  In  an  outdoor  or  planetary  exploration  environment.  The 
navigation  goals  make  explicit  which  problems  the  system  needs  to  solve 
at  any  time.  The  situation  is  one  of  continuous  perception  which  allows 
the  model  built  up  at  one  instant  to  be  used  in  subsequent  problem-solving 
Continuous  perception  also  allows  us  to  tell  which  objects  are  moving, 
which  is  of  use  In  outdoor  scenes.  The  complete  system  would  have 
stereo  and  motion  parallax  for  depth  at  small  and  great  distances.  To 
a  certain  extent,  the  system  would  avoid  finding  solutions  which  could 
be  found  purely  from  single  projection,  but  such  a  system  appears 
feasible  within  the  current  state  of  computer  vision,  while  a  system 
which  ignores  so  much  information  does  not  appear  to  be  achievable  soon. 


136 


A_PPENP1X 


ft 


« 


* 


» 


» 


Topological  Mode  I  a 

In  thin  section  we  give  a  brief  account  of  a  possible  approach  to 
the  topology  of  pictures  and  then  explain  the  sheaf- 

theoretic  model  of  textured  scenes,  involving  several  different  structure 
sheaves. 

The  topology  we  shall  use  is  discrete  and  is  induced  by  certain 
norms,  transplanted  from  the  /structure  of  integers.  Needless  to  say, 
the  purpose  of  this  topology  is  to  make  precise  the  use  of  such  notions 
as  continuity  and  proximity. 

Once  the  topology  is  fixed,  we  introduce  a  convenient  system  of 
neighborhoods,  called  windows.  These  will  be  used  throughout  this  work. 

Civen  a  textured  picture  with  the  discrete  topology  as  indicated 
above,  we  assign  to  every  window  over  the  picture  a  structure  of  some 
sort,  depending  on  the  picture  under  the  window  and,  perhaps,  on  some 
fragment  of  prior  knowledge  concerning  the  picture.  The  structure  in 
question  can  be  something  very  simple  such  as  a  set  of  descriptions  or 
something  more  involved  such  as  a  vector  space,  generated  by  the 
attribute  vectors  of  the  picture  under  the  window.  We  emphasize  the 
degree  of  generality  involved  in  the  specific  choice  of  structures. 

In  the  implementation  of  our  picture  identification  program  we  use  a 
structure  induced  by  the  Fourict  image  of  the  picture  function. 

After  the  species  of  structures  has  been  selected,  we  reduce  the 
degree  of  freedom  in  the  set  of  s  ucturcs  by  assuming  that  the  structures 
over  any  pair  of  windows  standing  in  an  inclusion  relationship  arc 
closely  related.  That  is,  one  of  the  structures  can  be  transformed 


157 


into  the  other  precisely  when  the  picture  function  under  the  two  windows 
is  continuous.  The  transformation,  which  in  a  general  situation  is 
called  a  homomorphism,  depends  on  the  picture  function  and  possibly  on 
a  prior  knowledge  relevant  to  the  picture.  If  we  imagine  that  the 
structures  carry  the  local  picture  information,  then  the  corresponding 
homomorphisms  tell  us  how  this  information  changes  as  we  move  from  one 
window  to  another.  Thus  the  question  as  to  when  and  how  to  join  two 
locations  on  the  picture  is  answered  by  the  homomorphisms,  interrelating 
some  of  the  structures. 

Topology  and  Metric  of  Digitized  Pictures 

One  of  the  most  efficient  ways  of  arriving  at  the  topological  model 
of  a  digitized  picture  is  to  consider  the  picture  as  a  set  of  cells  X 
and  coordinatize  cr  parametrize  it  by  the  finite  normed  two-dimensional 
space  of  integers  modulo  <n,m>: 


=  Z  x  Z  Aji,m> 
n,m  - 


More  specifically,  if  A  :  X  - 
function,  we  put 

AAA 

(i)  x  +  y  **  z  iff  x  +  y  =  z; 

A  A 

(ii)  Jx  *  y  iff  j*  x  *»  y; 

A  A 

(iii)  x  <  y  iff  x  <  y; 

(lv)  I  Ml  >111+111; 

(v)  «x»  -  Max(|  i|  ,|  j| ), 


Z  is  a  selected  coordinatization 

(Vector  addition) 

(Scalar  multiplication) 
(Partial  ordering) 

(Sum  norm) 

(Max  norm) 


Where  x,y  €  X,  x  ■  <i,j>,  and  i,J  €  Z. 

The  structure  {X, +,*,<,  ||  ||,  «  »J  is  called  the  cellular 


space .  where  the  conceptual  ingredients  are,  respectively,  vector  addition, 


p',r"''1  <»-.lorl..„.  aum-norm  (city  block  norm),  ,„,l 

max -norm. 

Both  norms  Induct  a  discrete  topology  In  the  cellular  space  X, 

The  intended  Interpretation  of  the  elects  of  X  is  the  retina  point 
a  geometric  location  of  the  point  information,  which  is  of  interest  in 
input  data.  Geometrically  we  can  third,  of  X  as  a  finite,  rectangular, 
two-dimensional  array  of  congruent  squares,  whose  coordinates  are  given ’ 
hy  a  grid  of  pairs  of  integers,  located  at  their  midpoints.  The  advantage 
of  defining  X  in  this  way  lies  in  the  possibility  of  using  a  coordinate- 

tree  (topological)  language,  and  when  necessary,  we  can  carry  over  the 
concepts  of  vector  calculus  to  X. 

When  several  coordination  functions  (sampling)  are  given,  one 
can  order  them  partially  by  the  fineness  or  coarseness  relation.  The 
finest  coordination  ,  is  usually  that  which  is  suitable  for  capturing 
the  ultimately  relevant  local  information  concerning  the  gray  level  shape 
-Dior  change.  Clearly,  a  finer  coordinization  function  leads  (at 
least  potentially)  to  a  more  complete  description  of  a  picture. 

The  subsets  of  X  are  subjected  to  generalized  vector  operation, 
such  as 

Algebraic  sum: 

A  +  B  =  {a  +  b| aeA  and  b£Bj , 
where  A,B  c  X. 

We  shall  not  use  these  operations  in  this  work  since  other 

operations  will  play  a  far  more  important  role.  The  horizontal  and 

vertical  rectangular  subsets  of  X  (i.e .,  planar  in tervals )  are  called 
windows : 


139 


A  is  a  window  iff  A  =  [a  £  X  |  x  <  a  <  y)  >  where  x  and  y  are 
some  cells  in  X.  Thus,  windows  are  essentially  two-dimensional  intervals. 

The  empty  window  is  denoted  by  0.  The  set  of  all  windows  Wind 
is  partially  ordered  by  inclusion.  In  fact,  it  forms  a  finite  distributive 
lattice  with  zero  element  0,  unit  element  X  and  with  operations: 
Intersection: 

A  A  B  =  A  fl  B; 

Union: 

AV  B=n  {ce  Wind  |  A  c  C  A  B  C  C} , 
where  A,B  £  Wind. 

The  window  A  V  B  denotes  the  smallest  rectangle  containing  A  and 
B.  The  lattice  <  Wind,  0,  x,  V,  A  >  will  be  the  basic  structure  in 
picture  identification.  (A  similar  lattice  is  obtained  by  taking  the 
convex  subsets  of  X.) 

The  choice  of  norms  in  X  induces  a  special  system  of  neighborhoods, 
suitable  for  developing  the  basic  properties  of  continuous  and 
proximal  functions  on  X. 

For  every  natural  number  p  we  define  the  p-  ne ighborhood  (von 
Neuman  template)  of  a  cell  x  by 

N(x;p)  =  iy|  I |x  -  y| |  <  p} . 

If  we  neglect  the  effect  of  the  picture  boundary,  a  p-  neighbor¬ 
hood  forms  a  diamond  shape  cluster  of  cells  about  x. 

Another  system  of  p-neighborhoods  (Moore  templates)  is  defined 
by  the  max-norm: 

M0;p)  =  £y|  «x  -  y»  <  p}. 

These  neighborhoods  form  square  windows  about  x,  if  we  forge _ 


€> 


© 


© 


© 


© 


© 


© 


© 


rr 
\  > 


about  the  effect  of  the  picture  boundary. 

As  pointed  out  in  the  introduction,  our  main  interest  will  be 
in  pictorial  relationships  such  as  neighboring,  inside  near,  equidistant, 
perpendicular,  overlap,  above,  etc.,  and  in  pictorial  objects  such  as 
figure  boundaries,  regions,  and  the  like.  These  are  certain  metric- 
topological  entities,  definable  in  terms  of  the  primitives  of  the 
cellular  space  X.  Theoretically  one  may  think  of  n  broader  class  of 
geometric  entities  (projective,  affine,  metrical,  and  topological),  but 
this  is  an  auxilliary  issue  now.  We  shall  totally  disregard  at  present 
semantic  relationships  and  semantic  objects .  induced  by  a  particular 
object-world  model. 

The  starting  point  of  a  picture  representation  is  a  picture  function 
p:  x  ->  R,  whose  values  are  called  gray  levels.  In  the  case  of  colored 
pictures,  the  values  p(x)  for  x  e  X  are  vectors,  representing  the 
intensity  of  light  for  a  fixed  system  of  colors. 

The  difficulty  with  the  picture  function  lies  in  the  fact  that  it 
is  a  point  function,  as  opposed  to  an  area  or  set  function.  We  need  a 
data  structure,  where  the  point  information  is  usefully  transformed  into 
a  local  or  areal  information  which  is  the  only  one  we  are  interested  in, 
now. 

In  order  to  achieve  this,  we  associate  with  every  restriction 
p|A  of  the  picture  function  p,  where  A  is  a  window  in  X,  a  structure . 
carrying  the  desired  local  information.  But  before  we  explain  how  can 
this  be  done,  we  shall  review  some  of  the  sheal-theoretic  notions. 
Presheaves  of  Pictorial  Structures  of  a  Given  Species 

What  are  presheaves  and  what  are  they  good  for?  These  are  the 


141 


questions  we  intend. to  answer  in  this  section.  As  for  the  theoretical 
details,  the  reader  may  consult  Bredon  (1967).  First,  we  state  the 
general  definition  of  a  presheaf  and  then  we  give  a  number  of  concrete 
examples  of  presheaves  relevant  to  picture  theory. 

Let  <E,  <>  be  a  partially  ordered  set.  Then  bv  a  oresheaf  nf 
structures  of_s£ecii_S_JJGMA  on  E  we  shall  mean  a  pair  of  sets 
S  =  <  lSa|  a  €  E),  |  a  <  b)  > 

such  that  for  all  a,b,c  -  E  the  properties  displayed  below  are  valid. 

(1)  is  a  structure  of  species  SIGMA  and  ^  for  a  <  b  is 

a  SIGMA-homomorphism  f£:Sb->  called  the  connecting  (transition) 

mapping  from  S,  to  S  . 

b  a 

(“-)  ^b‘^a  Sa  is  t*le  —  SIGMA-homomorphism  (automorphism) 

on  S . 

(?  ^  a  <  b  A  b  <  c  =>  RC  =  Rb  o  flC 

a  Ma  Mb 

A  presheaf  on  E  will  conveniently  be  denoted  by  S  =  fs 

a  MaJ 

The  species  SIGMA  refers  to  the  type  of  a  structure  which  could  be 
just  a  plain  set,  a  set  endowed  with  certain  relations  and/or  operations, 
or  anything  that  resembles  a  mathematical  structure  (group,  vector 
space,  automaton,  etc.).  The  only  point  to  be  realized  is  that  the 
structures  in  question  should  be  of  the  same  sort  or  type.  The  SIGMA- 
homomorphism  may  be  defined  in  various  ways,  the  simplest,  perhaps, 

being  the  structure  preserving  mapping  from  the  domain  of  one  structure 
into  the  domain  of  the  other. 

Before  we  launch  ourselves  into  a  more  specialized  study  of  sheaves, 
it  seems  useful  to  illustrate  the  definition  of  a  preshcaf  by  a  couple 
of  intended  interpretations.  This  will  hopefully  help  us  to  envisage 


the  picture  -  theoretic  applications. 

One  of  the  simplest  preshouves  is  the  constant  gregjieaf  of  sets. 
Here  Sg  «=  S  is  a  fixed  set  of  elements  and  is  the  identity 

mapping  on  the  fixed  set  S. 

(a )  Presheaf  of  continuous  functions 

Let  X  be  a  topological  space.  For  each  VCX  let 
be  the  set  of  all  continuous  real-valued  functions  f:V  -  R,  and  for 
W  c  V  let  -*  Sw  be  the  mapping  which  assigns  to  each  f  € 

its  domain  restriction  f|w.  Then,  of  course,  f|w  €  S  ,  since  a 
restriction  of  a  continuous  mapping  is  again  continuous.  This  con¬ 
struction  gives  a  presheaf  {Sy;$  on  the  set  of  all  subsets  of  X, 

partially  ordered  by  inclusion.  We  call  it  the  presheaf  of  continuous 
real-valued  functions  on  X. 

If  the  topological  attribute  "continuous"  is  replaced  by  "uniformly 
continuous",  "proxiraally  continuous",  "differentiable",  "analytic",  etc . , 
we  get  a  whole  family  of  new  presheaves  on  the  same  space.  Moreover, 
we  get  some  other  presheaves  on  X  when  we  consider  only  a  system  of 

neighborhoods,  e.g.,  N  -  {N(x;p)|  x  €  X,  p  >  0  J ,  rather  than  the  set 
of  all  subsets  of  X. 

(b)  Presheaf  of  Histograms 

Consider  a  picture  function  p:  X  -*  R  together  with  the 
lattice  of  all  windows  <  Wind.C  >  of  the  cellular  space  X.  Assign  to 
every  window  W  a  set  of  histograms  sw  or  more  precisely,  a  set  of 
distribution  functions  corresponding  to  a  family  of  random  variables, 
characterizing  certain  features  of  the  picture  p. 

As  connecting  mappings  ^  ,  choose  for  VC  W  an  appropriate 


143 


stpchast lc  transformation.  (stochastic  matrix)  transforming  the  element* 
into  the  element*  of  Sy.  The  presheaf  axiom*  arc  readily  verified, 
fhe  structure  {Sw;0^J  is  a  presheaf,  called  the  pre»heaf  of  historrami. 

A**i"»  we  can  takc  lhe  •y*tc«  of  square  window*  H  •  (M(x;p)|x  €  X,  p  -  0} 
and  consider  another  presheaf  of  histogram*. 

(c)  Pre  sheaf  of  Geometric  Hodela 

Let  p:  X  -*  R  be  a  picture  function  together  with  the  lattice 
of  windows  ‘'Mind,  c.^of  the  space  X.  Assign  to  every  window  a 
geometric  model  S^,  induced  by  the  picture  over  W.  The  geometric 
model  is  essentially  a  set  of  figure,  (line*,  circle*,  etc.)  together 
with  the  figure  attribute*  and  their  placement  rules. 

The  connecting  tHns  forma  lions  :  Sy  -  Sy  are  restriction*  or 
in  a  more  general  situation,  they  arc  certain  similarity  functions, 
assigning  to  every  clement  S„  n  most  similar  clement  from  S  .  In  the 

H  y 

case  of  pictures  with  local  gradients,  some  other  homomorphism*  may  be 
of  interest. 

(d)  Prcshca f  of  Feature  Spaces 

Let  p:  X  -*  R  be  a  picture  function  and  let  <Con,C  >  be 
the  lattice  of  convex  subsets  of  X.  Define  Sy  for  W  C  Con  as 
the  linear  space  generated  by  the  feature  vectors  associated  with  p/W 
via  measurement,  and  identify  each  AV 

for  W  c  v  with  a  linear 

transformation.  Then  {S^;  is  a  raoJel  for  the  presheaf  axioms. 

Several  other  scene  analysis  concepts  turn  out  to  have  a 
sheaf  -  theoretic  interpretation. 

Simply,  a  prcshcaf  is  a  formal  device  which  assigns  to 
certain  local  areas  of  a  topological  space  a  specific  structure  in 


( 


( 


C 


G 


O 


o 


D 


luU 


•ucb  a  way  that  whenever  two  area*  are  in  Inclusion  relationship, 
the  assigned  structures  are  in  ho»o*orphis«  relationship. 

The  reader  should  see  by  now  the  connection  between  presheaves 
and  picture  structure  Identl f icat ton  by  windowing.  Often  a  preshea f 
S  on  <  E,  <  >  has  a  relatively  staple  structure  "locally"  about  every 
point  a  C  C. 

A  suggestive  picture  of  a  presheaf  over  a  window  in  a 
cellular  space  is  given  below.  In  Fig.  by.  It  Is  inportant  not  to 
confuse  the  presheaf  structure  with  the  picture  function. 


S 


I  rr<|urnt ly  we  ate  not  Intarrated  In  att  windows  of  a  cellular 
space  but  rather  In  a  subset  of  windows.  Tills  U  the  case  when  the 
texture  clrernts  are  (area  enough  and  we  do  not  want  to  enter  Into  their 
structure.  In  situations  like  this,  the  following  notion  appears  to 
be  relevant. 

let  K  C  It  be  a  partially  ordered  subset  of  <  E,  £  >. 

Then  the  presheaf 

<  {Sj«  <  *').  (fc*  |  *  S  *  A  a,  b  €  r.#)  > 

Is  called  the  restriction  si  S  Ifi  E*  and  is  denoted  sj£*. 

Thus,  the  presheaf  structure  Is  considered  only  over  some  points  of  E. 

Often  several  presheave*  are  of  Interest  on  the  sane  base  E. 

In  situation*  like  that  we  want  to  know  how  to  relate  then. 

lari  S  *  IS  ;9^J  and  T  *  (T  ;  Jl*)  be  two  presheave*  on  E 

of  the  same  species  SICMA. 

Then  by  a  hososorphlis 
o:S  -•  T 

of  one  presheaf  Into  another  we  shall  mean  a  fealty  of  Stem  hoaomrphisn* 
o  •  {afl|  a  €  E)  such  that  for  all  a.b  C  E: 
a  <  b  •>»  o|‘’*iJo  oK. 

The  condition  above  Is  explained  suggest Iveliy  by  stating  that 
the  following  diagram  of  functions  Is  cowsuiativc  for  all  a  <;  b: 


C 


C 


In  l*n  formal  Ur**,  the  way  iH#  Kructurti  In  5  are 
related  can  be  nodded  In  ter**  of  the  structures  In  T.  TM*s  notion 
!•  Important  In  the  semantics  of  picture  idem  I  Ileal  Ion.  ve  § 

a*  a  "geoseirlc"  presheaf  and  T  at  a  “•mantle**  or  *Viorld>«odelH 
pre  sheaf  and  translate  the  geometric  information  of  S  Into  a  verld- 
»©del  In  form  t  ion  of  T. 

As  pointed  out.  In  picturvtheoretic  applications  presheaves 
are  essentially  rani  lies  of  structures  of  certain  species.  Interconnected 
by  transformations,  expressing  continuity. 

With  every  presheaf  S  of  species  SION  on  a  base  <  C,  ^  > 
we  associate  two  important  structures  of  the  sane  species  Slum, 
provided  that  certain  existence  conditions  are  net.  before  uc  show  how 
this  Is  done,  tvo  auxiliary  notions  are  Jn  order. 

the  tfArgtl  WMl  1 2$  S#  and  th,  direct 
•-J..  11  S*  * r®  P,4ln  •«*•.  P*<jd  Is  just  the  Cartesian 

product  of  the  sets  and  Sta.  S  la  the  disjoint  union  of  thee  sets. 

act 

Given  a  preshcaf  S  of  species  Slots  then  by  the  section 

Projective  or  inverse  Halt)  of  S  we  «an  the  structure  Sect  (S)  * 

(S  €  Prod  S  |  V  a,b  (a  £  b  ->  f  (•  )  -  s  I) . 
a<E  aba' 

Thus,  Sect  S)  of  a  substructure  of  the  direct  product 
Prod  Sa  (Prov,d«d  lh**  ii  exists). 

it 

Given  s  presheaf  5  of  species  Stem  then  by  the 
(inductive  or  direct  Halt)  of  S  we  understand  the 
Cosect (S)  •  Sta  S^/  *  , 


structure 


vhere  Sw  5,  *  (  <  a,s  >  |  »€$  )  denotes  the  direct  su*  ol  the 

4  " 

f 4i»*  | y  (*4)  (or  aCE  and  *  is  the  smallest  equivalence  relation 

,n  direct  tue,  containing  the  binary  relation  *; 

<.,•>.  <k.i>  o>  3  c  .  ,*  («)  |. 

vlth  #C3  and  ICS.  . 

*  0 

Ihue,  Co'ject  5  Is  a  quotient  »trufur*  of  the  direct  sun 

Sun  S  provided  that  It  exist*), 
a 

a*lven  a  presheaf  S,  It  can  be  shown  that  tf  *V*  Is  »  ||rtt 
elewrnt  of  K,  then  the  isonorphisn 

S  *  SectfS) 
e 

holds  and  alvo 

osS  *•  T  o  *  iS  ■»  T  . 

v  C  fp 

In  other  words,  the  structure  SectfS)  can  be  Identified  with  the  structure 
sr.  assigned  to  the  first  elencnt  *  in  C.  In  appl teat  Ion* ,  SectfS) 
corresponds  to  the  structure  of  particular  texture  elements. 

Dually,  If  c  Is  a  l»«t  el  enrol  of  E,  then 
Cosect fS) 

and 

»:S  -*  T  •>  *  .  :  s  f  —  T  . 

e  e  t 

'Rain,  Cosect  S  can  be  Identified  with  the  structure  S  (  assigned 

to  the  last  elcaeni  c  In  E.  In  applications,  Cosect(S)  corresponds 
to  the  structure  of  the  textured  region, 
heavcj.ol  pictorial  Structures  of  #  Given  Soecles 

Sheaves  are  presheaves  satisfying  additional  ax  toss.  A 
definition  of  a  sheaf  In  Its  full  generality  requires  several  additional 

Ib6 


technicalities.  Ve  shall  present  therefore  such  «  definition  which  It 
free  of  abstract  conceptual  constructions ,  general  enough,  and  yet 
still  relevant  In  picture  analysis. 

Let  <  Wind,  £  >  be  the  lattice  of  windows  over  the  cellular 
space  X.  Consider  a  presheaf  (Sy,  |JJ}  over  the  winds*  systess 
<  Wind,  £  >.  Then  this  presheaf  Is  called  a  shea f  If  for  any  foully 
of  windows  (WtJ  J  ^  x  such  thst  «(  £  V  M  €  l«|)n|  ^  x 

the  following  lsonorphisn  Is  valid: 

^  fly  -  Sect  (SKMj)). 

1  1 


Thus,  loosely  speaking,  a  sheaf  is  a  ays ten  of  structures  over 
a  lattice  of  windows,  where  each  structure  represents  one  particular 
texture. 

Dually,  we  call  presheal  ISy,  a  cpshoaf  If  for  any 

faolly  of  windows  ^  ^  such  that  W  £  C  ^ 

the  following  lsonorphisn  Is  valid: 

SU  W  -  Co,#ct  <slCMi) >* 

1  1 


A  no  re  direct  definition  of  a  sheaf  with  a  fairly  clear  picture- 

theoretic  Interpretation  Is  given  below. 

Consider  a  presheaf  S  »  (S^;  of  structures  over  a 

cellular  space  X,  i.e.,  on  the  lattice  of  subsets  <  SubfX),  c  >. 

Then  S  Is  a  sheaf  over  X  precisely  wh?n  for  any  fanlly  |  tel } 

of  subsets  of  X  with  V  •  U  V. ,  the  following  two  conditions  are 

1  1 


satisfied: 


1U9 


(0  tol«U»n»M  Axiom! 

{.”)i  •> 


•"i 


(2)  Existence  Axiom 

i  n  vj  1 


Byj  nv  *j)i  “> 

*  J 


•*k  I  (•)  •  ik), 

k 


Mh#r#  V*v  *VSV 

J  ^ 


The  coni'll  Ion  (I)  says  that  If  the  structure  elements  s  arc 
toLL*  identical ,  then  they  are  alto  globally  Identical.  That  1*. 
elements  are  uniquely  determined  by  local  data. 

The  condition  (?)  >ay.  that  If  we  have  local  data  which  are 
compatible,  they  actually  "patch  together”  to  form  global  data. 

Thla  might  appear  aa  a  perltapa  unduly  aophlsticated  way  of  looking 
at  the  windowing  proceaa  in  which  by  overlapping  windowing  we  are 
capable  to  recover  the  unique  atructure  of  the  picture  from  acvcral 
local  atructurea.  The  definition  of  a  aheaf  will  turn  out  to  be  a  teat 
method  for  texture  region  identification. 

The  picture  -  theoretic  aubatancc  of  sheaves  la  thla.  A  aheaf 
1*  essentially  a  system  of  "local  coefficient".  In  picture-theoretic 
applications  w t  start  by  assuming  that  a  picture  has  certain  local 
pictorial  properties  which  arc  captured  by  a  atructure  of  certain 
species.  We  then  express  these  properties  in  teres  of  the  properties 
ot  thc  structure  sheaf  S  over  a  picture  region.  Finally,  we  apr  y 
the  theory  of  sheaves  to  deduce  certain  global  properties  of  the  fieturc 
Consequently,  the  importance  of  sheaves  in  scene  analysis  is  simply 


In  giving  relations  between  local  and  global  proper tiea  of  a  scene. 


references 


[1]  Andrews,  li.C.,  Computer  Techniques  in  Image  Processing.  New  York 
Academic  Press,  “ 

[P]  Atteneave,  1-.,  and  Arnoult,  M.D.,  ,lTlio  Quantitative  Study  of  Sliape 
and  Pattern  Perception",  Psychol.  Bull..  53  (1956) ,  452  -71. 

f‘>i  Atteneave,  F.,  and  Olson,  R.K.,  'What  Variables  Produce  Similarity 
Grouping?",  The  American  J.  of  Psychology,  (I970),  1-21. 

[4]  Bajcsy ,  R.,  "Computer  Texture  Analysis"  in  B.S.M.  Gransberg  (Ed.-), 
Proceedings  of  the  Third  Hawaii  International  Conference  on 
System  Sciences.  Western  Periodical  Co.,  1970,  1010-1015. 

[5.1  Beck^  J.,  "Perceptual  Grouping  Produced  by  Line  Figures",  Perception 
and  Psychophysics  2.  ( 1 967 ) ,  491-495.  - - 

[6]  Bennet,  R.S.,  "Intrinsic  Dimensionality  of  Signal  Collections", 

IEEE  Trans.  Inform.  Theory,  Vol.  IT-15.  (I969),  5I7-525. 

[7]  Brice,  C.R.,  and  Fennema,  C.L.,  "Scene  Analysis  Using  Regions" 
Artificial  Intelligence.  Vol.  1,  No.  5,  (Fall  1970). 

Binford,  T.,  "A  Visual  Preprocessor",  Internal  Report.  Cambridge 
Mass.,  MAC,  MIT,  (19?0). 

Blakemore,  C.  and  Campbell,  F .W . ,  'ton  the  Existence  of  Neurones  in 
the  Human  Visual  System  Selectively  Sensitive  to  the  Orientation  and 
Size  of  the  Retinal  Images",  J.  Psvchol.  20 V  (1969) ,  327-260. 

[10]  Bredon,  G.E.,  Sheaf  Theory.  New  York,  McGraw-Hill,  1967. 

[11]  Brill,  E.L.,  Character  Recognition  via  Fourier  Descriptors", 

WESCON  Technical  Papers.  Session  25,  Qualitative  Pattern  Recognition 
Through  Image  Shaping,  Los  Angeles,  August,  I968. 

L  Brom,  D.R.,  and  Oewen,  D.H.,  "The  Metrics  of  Visual  Form".  Psvchol 
Bull.,  68.  (1968),  243-259.  -1 - 

[13]  Campbell,  F.W.  and  Kulikowski,  J.J.,  "Oriental  Selectivity  of  the 
Human  Visual  System",  J,  Psvchol.  187.  (I966),  437-445. 

14  Campbell,  F.W.,  and  Robson,  J.G.,  "Application  of  Fourier  Analysis 
to  the  Visibility  of  Gratings",  J.  Psvchol.  107.  (I968),  551-566 . 

[15]  Clowes,  M.B.,  "Picture  Syntax",  In:  S.  Kaneff  (Ed.),  Picture 
Language  Machin..  New  York,  Academic  Press,  1970. 

[16]  Dacey,  M.I.,  "Description  of  Line  Patterns",  Quantitative  Geography. 
Northwestern  Univ.  Studies  in  Geog.,  No.  13,  Evanston,  I967,  277-87! 


152 


c 


1  £r°"  sateiii“ 

11  (for^ngK  W2.  e”  —  r"  Rec°snltl°n  and  Pi^'"-p  ^Sgcessina, 

]  Enroth-Cugel ,  C.,  and  Robson,  J.G.,  "The  Contrast  Sensitivity  of 
Retinal  Ganglion  Cells  of  the  Cat",  J.  Psychol..  187.  (1 966),  5I7- 

]  ^lk>  5:*  "CotnPuter  Interpretation  of  Imperfect  Line  Data  as  a 
Three-dimensional  Scene",  AIM-132,  Artificial  Intelligence  Project, 
Stanford  University,  Stanford,  California  (1970). 

I  Fjish  H.L. ,  and  Julesz,  B.,  "Figure -ground  Perception  and  Random 
Geometry  ,  Perception  and  Psychophysics.  1,  (1966),  389-398. 

Flock  H.R.,  "Optical  Texture  and  Linear  Perspective  as  Stimuli 
for  Slant  Perception  ,  Psychol.  Review.  7?.  (I965),  505-514. 

Freeman,  R.B.,  Jr.,  "Theory  of  Cues  and  the  Psychophysics  of  Visual 
(Whole  No?6^)?0  *  M°nfir‘  SUPU>>  (15f70^  Vo1*  5,  No.  13 

Fukunaga  K.,  and  Olson,  D.R.,  "An  Algorithm  For  Finding  Intrinsic 
Dimemonality  of  Data",  IEEE  Trans,  on  Computers.  Vol.  C-20,  No.  2, 

MiffHn/i^i." *He  PerCepti°n  °f  the  Visual  World"’  Boaton>  Houghton 

fnt^oduction  to  Fourier  Optic*.  San  Francisco,  McGraw- 
Hill  Book  Co.,  I968. 

Guzman,  A. ,  "Decomposition  of  a  Visual  Scene  Into  Three-dimensional 
Bodies  ,  groc.  Fall  Joint  Computer  Conf . .  AFIPS  33,  Washington, 
Thompson  Book,  I968,  291-304  .  8 

Hawkins,  J.K;,  ',Textural  Properties  for  Pattern  Recognition",  In: 

B.S.  Lipkin  (Ed.),  Picture  Processing  and  Psychopictorics .  New  York, 
Acad.  Press,  I970.  -  ’ 

Horn,  B.K.P. ,  "Shape  From  Shading:  A  Method  for  Obtaining  the 
Shape  of  a  Smooth  Opaque  Object  From  One  View".  MAC-TR-7Q  PrrHp^r 
MAC,  MIT,  Cambridge,  Mass.  (November  I970) .  - 

Huang,  T.S.,  'Digital  Fourier  Analysis. 

Hue eke  1 ,  M.H.,  "An  Operator  Which  Locates  Edges  in  Digitized  Pictures" 
J.  of  ACM.  Vol.  18,  No.  1  (Jan.  I97I),  II3-I25.  ’ 


153 


o 


0 


o 


o 


o 


o 


o 


o 


[32]  Hubei,  D.H.,  "Single  Unit  Activity  in  Lateral  Geniculate  Body  and 
Optic  Tract  of  Unrestrained  Cats",  Psychol..  150.  (i960),  01-104. 

[33]  Hubei,  D.H.,  and  Wiesel,  T.N.,  "Perceptive  Fields  of  Optic  Nerve 
Fibres  in  the  Spider  Monkey",  J.  Psychol.,  154.  (i960),  572-5O8. 

[34]  Hubei,  D.H.,  and  Wiesel,  T.N.,  "Perceptive  Fields,  Binocular 
Interaction  and  Functional  Architecture  in  the  Cat's  Visual  Cortex" 

J.  Psychol.,  16 0.  (1962),  106-154. 

[35]  Hubei,  D.H.,  and  Wiesel,  T.N.,  "Receptive  Fields  and  Functional 
Architecture  of  Monkey  Striate  Cortex".  J.  Psvchol.  105.  ('1068') 

215-243. 

[36]  Johnson,  S.C.,  'Hierarchical  Clustering  Schemes",  Psvchometrica  52. 

(1970),  241-54.  - *-* 

[37]  Julesz ,  B.,  "Visual  Pattern  Discrimination",  IRE  Trans,  on  Information 
Theory,  IT -8.  (1962),  84-92. 

[38]  Julesz,  B.,  "Some  Recent  Studies  in  Vision  Relevant  to  Form  Perception", 
In  W.  Shaten-Dunn,  (Ed.),  Models  for  the  Perception  of  Speech  and 
Visual  Form,  Cambridge,  Mass.,  MIT  Press,  I967 ,  I36-I54. 

[39]  Julesz,  B.,  and  Stromeyer,  C.F.,  "Masking  of  Spatial  Gratings  by 
Filtered  One -dimensional  Visual  Noise",  Talk  at  the  10th  Annual 
Meeting  of  the  Psychonomic  Society,  San  Antonio,  Texas,  Nov.  5-7 
1970. 

[40]  Julesz,  B.,  Foundations  of  Cyclopean  Perception.  Chicago,  The  University 
of  Chicago  Press,  197 1. 

[ 4 1 ]  Kirsch,  R.A.,  "Computer  Interpretation  of  English-Text  and  Picture 
Patterns",  IEEE  Trans.  On  Electric  Computers.  EC-13,  No.  4,  (1964) 

P.  63-76. 

[42]  Kovalevsky,  V.A.,  Present  and  Future  of  Pattern  Recognition  Theory", 

In  Proc.  IFIP  Congress.  Vol.  I,  Spartan  Books,  Inc.,  I965 ,  pp.  37-43! 

[43]  Krakauer ,  L.J.,  "Computer  Analysis  of  Visual  Properties  of  Curved 
Objects",  MAC-TR-82,  Project  MAC,  MIT,  Cambridge,  (May  1971). 

[44]  Kruskal,  J.B.,  "Multi-dimensional  Scaling  by  Optimizing  Goodness  of 

Fit  to  a  Nonmetric  Hypothesis",  Psvchometrica.  20  ('1064')  1 -P7 

H5-I29.  -  "  '  ’ 

[45]  Kuffler,  S.W.,  "Discharge  Patterns  and  Functional  Organization  of 
Mamualim  Retina",  J.  Neurophysiol ■ .  16.  (1953),  37-68. 

[4h]  Ledley ,  R.S.,  Rotolo,  L.S.,  Golab,  I.J.,  Jacobson,  J.D.,  Ginsberg,  M.D., 
and  Wilson,  J.B.,  "FIDAC :  Film  Input  to  Digital  Automatic  Computer 
and  Associated  Syntax-directed  Pattern  Recognition  Programming  System", 


154 


In  J.T.  Tippett  (Ed.)  Optical  and  Electro-Optical  Information 
Processing,  Cambridge,  Mass.,  MIT  Press,  I965 • 


0 


[47]  Levine,  M.D.,  "Feature  Extraction:  A  Survey",  Proc.  IEEE,  57.  (1969)  » 
1391-1407 . 

[48]  Lendaris,  G.G.,  Stanley,  G.L.,  "Diffraction  Pattern  Sampling  for 
Automatic  Pattern  Recognition",  Proc.  IEEE,  58.  (1970),  1 98-2 16. 

[49]  Lipkin,  L.E.,  et  al.,  "The  Analysis,  Synthesis,  and  Description  of 
Biological  Images",  Am.  N.Y.  Acad.  Science,  128,  (I966),  984-1012. 

[50]  Miller,  W.F.,  Shaw,  A.C.,  "Linguistic  Methods  in  Picture  Processing: 

A  Survey",  Proc.  AFIPS,  I968  FJCG,  Vol.  33>  Washington,  D.C., 

Thompson  Book  Co.,  pp  279”29Q* 

[51]  Mountcastle,  V.B.,  "Modality  and  Topographic  Properties  of  Single 
Neurons  of  Cat's  Somatic  Sensory  Cortex",  J.  Neurophysiology,  20, 

(1957),  408-434. 

[52]  Munson,  J.H.,  "Experiments  on  the  Recognition  of  Hand-Printed  Text", 

Part  I  -  Character  Recognition,  Proc.  FJCC.  I968,  1125-1138. 

[53]  Nachmias,  J.,  Sachs,  M.B.,  and  Robson,  J.G.,  "independent  Spatial- 
Frequency  Channels  in  Human  Vision",  J.  Opt.  Soc.  Am. ,  39  (1969), 

1538. 

[54]  Narasimhan,  R.,  "Syntax-directed  Interpretation  of  Classes  of 
Pictures",  Comm.  ACM,  9,  (I966),  166-173* 

[59]  Narasimhan,  R,,  "Picture  Languages",  In  (Ed.)  S.  Kaneff,  Picture 
Language  Machines.  1970,  New  York,  Academic  Press. 

[56]  Novikoff,  A.E.J.,  "integral  Geometry  as  a  Tool  in  Pattern  Perception", 

In  (Ed.)  H.  von  Foerster  and  G.W.  Fopf,  Principles  of  Self-Organization. 
1962,  New  York,  Pergamon  Press. 

[57]  Pantle,  A.,  and  Sekuler,  R.W.,  "Contrast  Response  of  Human  Visual 
Mechanisms  Sensitive  to  Orientation  and  Detection  of  Motion",  Vision 
Res.  9  .  (1968),  397-^06. 

[58]  Papoulis,  A.,  Systems  and  Transforms  with  Application  in  Optics,  New 
York,  McGraw-Hill  Book  Co.,  1968. 

[59]  Pfaltz,  J.L.,  "Web  Grammars  and  Picture  Description”,  Techn.  Report. 
7O-I38,  6J-754,  University  of  Maryland,  Computer  Science  Center,  1970. 

[60]  Pingle,  K.K.,  "Visual  Perception  by  a  Computer",  In  (Ed.)  A.  Grasseli, 
Interpretation  and  Classification  of  Images,  1969#  New  York,  Academic 
Press. 

[61]  Pollehn,  H.,  and  Roehnig,  H.,  "Effect  of  Noise  on  the  Modulation 


o 


o 


G 


o 


o 


o 


o 


155 


Transfer  Function  of  the  Visual  Channel" 

(1970),  842-81.8. 


J .  Opt.  Soc.  Am. .  6o, 


[62] 


Powel,  T.P.S.,  and  Mountcastle,  V.B., 
Postcentral  Gyrus  of  the  Monkey  Macaca 

105,  (1959),  108-131. 


"The  Cytoarchitecture  of  the 
Mulatta",  Bull.  John  Hopk-lns 


[65]  Prewit,  J.M.S.,  and  Mendel son, 
Annals  N.Y .  Acad.  ScjP.nfP[  128, 


M.L.,  "The  Analysis  of  Cell  Images" 

(1968),  1035-1053- 


I 


[64] 


Quam,  L.H.,  "Computer 
Intelligence  Project, 
(May  I97I). 


Comparison  of  Pictures",  AIM-144,  Artificial 
Stanford  University,  Stanford,  California, 


[65] 

[66] 

[67] 

[68] 

[69] 

[70] 

[71] 

[72] 

[75] 

[74] 

[75] 

[76] 


Rosenfeld,  A.,  "Automatic  Recognition  of  Basic  Terrain  Types  From 
erial  Photographs  ,  Photogram.  Eng..  28.  (I962),  II5-I32. 

Rosenfeld  A.,  "On  Models  for  the  Perception  of  Visual  Texture" 
l£L-M°-dels  for  the  Perception  of  Speech  and  Visual  Form  Proceeding 
a  Symposium,  Nov.  11-14,  1964. - *  ^roceedin8s 

Rosenfeld,  A.,  and  Pfaltz,  J.L.,  "Distance  Functions  on  Digital 
Pictures  ,  Pattern  Recognition.  Vol.  1  (I968),  33-6I.  8 

H^York^l^.""^  Pr°C&SSi^  hy.£SBP'**r,  Academic  Press, 

Rosenfeid  A.,  Pfaltz,  J.L.,  "Web  Irammars",  TR-69-84 ,  Computer 
Sciences  Center,  University  of  Maryland,  College  Park,  Maryland  (1969). 

Aj  ’  „and  Hurston,  M. ,  "Edge  and  Curve  Detection  For  Visual 
113  ysls  *  — EE  Trans.  On  Computers.  Vol.  C-20,  No.  5,  May  I97I. 

Shaw,  A.C.,  "The  Formal  Description  and  Parsing  of  Pictures"  STAr 
Scalinf’w^h”  ’nh!  An3lySiS  of  Proximities:  Multi-dimensional 

27  WfiS-SSUaSsr** Function’  1  and 

York,  Academic  Pr?^,  1966,  pp.  Krlshnaiah- 

Sobel,  I.,  Camera  Models  and  Machine  Perception",  AIM-121  Artificial 
intelligence  Project,  Stanford  University,  Itenforf,*  lifornL  (£$. 

Tenenbanm,  J.M.,  "Accommodation  in  Computer  Vision",  AIM- 13b,  Artificial 
Intelligence  Project,  Stanford  University,  Stanford,  California  (^0) 


Ward,  T.N.,  Jr.,  "Hierarchical  Crouping  to  Optimise  an  Objective  Function", 

156 


J.  Am.  Statistical  Assoc..  5Q  (19 63),  236-244. 

Wertheimer,  M.,  *Exper  imentelle  Studien  Uber  das  Sehcn  von  Bcwegung" 
Z.  Psychol.,  61  (1972)  pp.  161-265,  Translated  in  large  part  in: 

T.  Shipley  (Ed.),  Classics  in  Psychology,  New  York,  Philosophical 
Library,  I96I. 

Wiese  1 ,  T.N.,  and  Hubei,  D.H.,  "Spatial  and  Chromatic  Interaction 
in  the  Lateral  Geniculate  Body  of  the  Rhesus  Monkey".  J.  Neuro- 
physlol . ,  29,  (1966),  1115-1156. 


Wolf,  P.D. ,  Term  Project,  MIT  (I970). 


